When I used:
spe_vob= codecs.open('../../data/processed/pretrained_tokenizer/SPE_ChEMBL.txt')
spe = SPE_Tokenizer(spevob)
I am getting the following error!!
Error: invalid line 1 in BPE codes file:
The line should exist of exactly two subword units, separated by whitespace__
When I used: spe_vob= codecs.open('../../data/processed/pretrained_tokenizer/SPE_ChEMBL.txt') spe = SPE_Tokenizer(spevob) I am getting the following error!!
_