Closed dan2097 closed 6 years ago
Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).
I think that's actually a mistake and that first N should really be [nH]. The lower case means that an atom can form a double bond to another lower case atom assuming doing so will not violate valency. In IUPAC parlance that the atom has the maximum number of noncumulative double bonds.
An example of a cytosine which probably should have a double bond to that first atom is 5H-cytosine, which is probably currently misinterpreted. I think the reason for my use of N was that historically n and nH were not distinguished which could cause hydrogens to move, in the current code the H is used as a hint as to where to put the hydrogen if one ends up with an odd number of atoms that are eligible for double bonds. cf. SMILES for pyrrol, but note that being an nH doesn't guarantee that that atom will have a hydrogen in the final molecule e.g. 2H-pyrrole
General rule of thumb is that if your tool generates aromatic SMILES, OPSIN should accept them.
Original comment by Noel O'Boyle (Bitbucket: baoilleach, GitHub: baoilleach).
Ok, got it. If these are errors, I can probably write a Pybel script to find such cases, if this is helpful i.e. rings where only some of the atoms are marked as aromatic. And maybe even correct them. :-)
Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).
I've fixed the obvious cases in that file, in integration testing the only change was in the interpretation of 3,N4-ethenocytosine. I think this now gives a tautomer of the typically given structure, albeit InChI considers them to still be different. (The integration testing actually said the structure was now wrong... as for some reason the ancient version of ChEBI I use for integration testing also was missing a double bond, the current version of ChEBI has it correct). I wouldn't worry too much about the choice of n vs N in OPSIN's resources unless it is effecting the output.
Original report by Noel O'Boyle (Bitbucket: baoilleach, GitHub: baoilleach).
Hi Daniel,
I'm trying to add an entry to one of the XML files and I'm wondering when to use lowercase. E.g. in arylgroups.xml, there's:
This has a ring where only 5 of the 6 atoms are lowercase. How do I decide when to use lowercase in these circumstances?