SELFIES preprocessing failed for a substantial part of the dataset (9,509 molecules in chebi_v231) because some SMILES features are not covered by the selfies library -> I added RDKit normalisation of SMILES before translating to SELFIES (if direct translation fails) -> now, preprocessing only fails for 151 molecules
Fixes for prediction generation:
The last (incomplete) batch of the dataset was lost before, is now saved as well
The size of the saved prediction files remains constant (independent of the batch size used
Some tokens have been added (from graph datasets and new ChEBI versions)
selfies
library -> I added RDKit normalisation of SMILES before translating to SELFIES (if direct translation fails) -> now, preprocessing only fails for 151 molecules