dan2097 / opsin

Open Parser for Systematic IUPAC Nomenclature. Chemical name to structure conversion
https://opsin.ch.cam.ac.uk
MIT License
158 stars 32 forks source link

"Kilian" returns C1000H2002 #237

Closed jasondbiggs closed 10 months ago

jasondbiggs commented 10 months ago

I don't know enough about nomenclature to say this is definitely a bug, but it is very unexpected that "Kilian" returns a string with one thousand consecutive "C"s

dan2097 commented 10 months ago

https://iupac.qmul.ac.uk/misc/numb.html The IUPAC prefix for 1000 is kilia while implies that an alkane of length 1000 in English should be called kiliane. As in German the 'e' at the end of alkanes is omitted, and because it simplifies the implementation, OPSIN treats the terminal 'e' as optional and hence also accepts kilian. While it's rare there are other cases where a chemical term can have common non-chemical meanings e.g. "lead", "germane"

jasondbiggs commented 10 months ago

Thanks Dan - I wondered if it was something like that but wasn't considering the optional terminal e (heptan == heptane)

I asked the chatbot to generate programmatic alkane names for some arbitrary alkanes but the most interesting I got were pentacontane (50 carbon atoms) and nonacosane, which the bot suggests for 900 but parses as 29 carbons.