XinhaoLi74 / SmilesPE

SMILES Pair Encoding: A data-driven substructure representation of chemicals
https://xinhaoli74.github.io/SmilesPE/
Apache License 2.0
181 stars 31 forks source link

Can we adapt this to hugging face Tokenizers? #6

Open karims opened 3 years ago

karims commented 3 years ago

Hi, I wanted to know if this codebase can be directly utilized to make a hugging face Tokenizer. It doesn't look like, but wanted to know if I'm missing something.

XinhaoLi74 commented 3 years ago

Hi, The following link contains an example of generating an SPE tokenizer from the Huggingface PreTrainedTokenizer. Hope this is helpful. https://colab.research.google.com/drive/1tsiTpC4i26QNdRzBHFfXIOFVToE54-9b?usp=sharing