Open elemets opened 3 years ago
Hi,
Thanks for this project, it looks like it could be really helpful. Sorry if this is a stupid question but I was wondering, once I've tokenized a set of SMILES using the pre-trained SMILES model how would I get the token ids?
Thanks A
There is a way you can get the ID for each SMILES
test_spe_word = spe.tokenize(...) for word in test_spe_word.split(' '): print(spe.bpe_codes[spe.bpe_codes_reverse[word]]) # output ID
Hi,
Thanks for this project, it looks like it could be really helpful. Sorry if this is a stupid question but I was wondering, once I've tokenized a set of SMILES using the pre-trained SMILES model how would I get the token ids?
Thanks A