Open bkovats opened 1 month ago
Hi @bkovats ,
Thank you for bringing this issue to my attention. The problem with the reverse translation is caused by data imbalance in our training dataset. To resolve this, we need to introduce more single-word names into our training data.
I will investigate this issue thoroughly and work on implementing a solution. Your feedback is valuable and will help us improve future versions of the software.
Best regards, Kohulan
Dear @Kohulan,
I had a look at the tool on a small SMILES set: I generated the IUPAC names with
translate_forward
, re-converted these to SMILES (using OPSIN), and checked if the structures I got after the conversions match the input structures. In the attached file I've collected a few examples where the structures don't match - I thought these might be useful in the further development of the tool.stout_incorrect_structures.csv
Regarding
translate_reverse
, the SMILES generation yields strange results for simple molecules, e.g.: