text = """Predict the binding strength (in pIC50 unit) between the following protein-ligand pair:
CC(C)(C)CC[C@@H](N1C(=O)C(=N[C@@]11CC[C@@H](CC1)C(C)(C)C)c1ccc(F)c(F)c1)c1ccc(cc1)C(=O)NCc1nn[nH]n1
and
MAGAPGPLRLALLLLGMVGRAGPRPQGATVSLWETVQKWREYRRQCQRSLTEDPPPATDLFCNRTFDEYACWPDGEPGSFVNVSCPWYLPWASSVPQGHVYRFCTAEGLWLQKDNSSLPWRDLSECEESKRGERSSPEEQLLFLYIIYTVGYALSFSALVIASAILLGFRHLHCTRNYIHLNLFASFILRALSVFIKDAALKWMYSTAAQQHQWDGLLSYQDSLSCRLVFLLMQYCVAANYYWLLVEGVYLYTLLAFSVLSEQWIFRLYVSIGWGVPLLFVVPWGIVKYLYEDEGCWTRNSNMNYWLIIRLPILFAIGVNFLIFVRVICIVVSKLKANLMCKTDIKCRLAKSTLTLIPLLGTHEVIFAFVMDEHARGTLRFIKLFTELSFTSFQGLMVAILYCFVNNEVQLEFRKSWERWRLEHLHIQRDSSMKPLKCPTSSLSSGATAGSSMYTATCQASCS
"""
tokenizer.tokenize(text)
Use as follows:
The two files can be retrieved from the Box folder linked in the main README: https://ibm.box.com/s/kijawq3rf4191bbcyflsxx7kp9m74jnx
Works as follows:
Should give: