Closed himanshisyadav closed 5 months ago
Hi @himanshisyadav, I tried your example and Cl is tokenized to be Cl, not C.
sequence = "[O-][Cl+3]([O-])([O-])[O-]"
print(tokenizer.tokenize(sequence))
['[', 'O', '-', ']', '[', 'Cl', '+', '3', ']', '(', '[', 'O', '-', ']', ')', '(', '[', 'O', '-', ']', ')', '[', 'O', '-', ']']
Maybe you can check your code where the tokenizer is imported and defined. Let me know if you still have any question.
@ChangwenXu98 Thank you for your quick response! I'll check my code.
Why does the tokenizer tokenize Cl as just C?
However, the tokenizer performs differently for Li and Br.
Thank you for your help!