XinhaoLi74 / SmilesPE

SMILES Pair Encoding: A data-driven substructure representation of chemicals
https://xinhaoli74.github.io/SmilesPE/
Apache License 2.0
177 stars 30 forks source link

atomwise_tokenizer ignore the "i" token #3

Closed skalinin closed 3 years ago

skalinin commented 3 years ago

Hello! Can you tell please, why does atomwise_tokenizer ignore the "i" token?

>>> smi = 'BriISiCc'
>>> toks = atomwise_tokenizer(smi)
>>> print(toks)
['Br', 'I', 'S', 'C', 'c']
skalinin commented 3 years ago

I got it, read the https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system thanks for you lib :)