XinhaoLi74 / SmilesPE

SMILES Pair Encoding: A data-driven substructure representation of chemicals
https://xinhaoli74.github.io/SmilesPE/
Apache License 2.0
181 stars 31 forks source link

Error with pertained vocal #10

Open MohaiminDev opened 2 years ago

MohaiminDev commented 2 years ago

When I used: spe_vob= codecs.open('../../data/processed/pretrained_tokenizer/SPE_ChEMBL.txt') spe = SPE_Tokenizer(spevob) I am getting the following error!!

Error: invalid line 1 in BPE codes file: The line should exist of exactly two subword units, separated by whitespace__

_