Closed bbuesser closed 10 years ago
We now use toRdkitMolecule not just for generating SMILES (done for everything to make its label) but also to draw the molecules (rdkit was better at generating 2d coordinates of polycyclic species). We were considering also using it for aromaticity detection.
It'd be hard to avoid toRDKitMolecule, so I would suggest fist seeing if we can make rdkit cope with tetravalent nitrogen.
An alternative workaround for the drawing code would be to pretend to RDKit that the N is a C, for the purpose of getting the 2d coordinates. We then put a "N" were it tells us that atom should be.
Not sure how to trick it into giving us SMILES strings - but perhaps SMILES is ill-suited to varying valences anyway and we should make up a pseudo smiles or just fall back to the chemical formula (wrap toSMILES in a Try block and fall back to toFormula if it fails). May suffice in some situations (eg. making labels).
I like the ideas of your last comment. I will try to make a wrap for toSMILES that uses the toFormula if the molecules contains any of the four-valent nitrogen atom types (N4s, N4d, N4dd, N4t, N4b). That way we can use RDKit now and follow its future development for varying valency.
I have tested the Nitrogen branch with the new additions around RDKit. The common three-valent nitrogen species work great and make nice pictures of the species. However four-valent nitrogen results in an error message from the RDKit module: Explicit valence for atom # 0 N greater than permitted. Although I write my input file with adjacency lists, I think there is a conversion to SMILES somewhere in RMG using RDKit that throws this error.
I have found presentation slides from the 1st RDKit user meeting last year and it seems that this is an ongoing topic on how to deal correctly with varying valence, especially as different chemiinformatics software treat it differently which makes a common interpretation for standards like SMILES difficult.
I think it would be nice and important to let atoms vary their valence in RMG, especially for nitrogen and sulphur.
My question to you is: Is this toSMILES conversion of every species crucial for the current RMG or do you know other code lines that could be responsible?