cdk / depict

SMILES Depiction Generator
GNU Lesser General Public License v2.1
55 stars 14 forks source link

relabelling of atoms is possible? #49

Closed biotech7 closed 2 years ago

biotech7 commented 2 years ago

I'm not familiar with cdk depict module and can't find a way to relabel atoms for IUPAC atoms labeling order. so, is it possible to relabel atoms for a molecule? e.g.: make changes as this way: atom9 relabelled to 1, atom12 to 3, atom2 to 7..... label

nbehrnd commented 2 years ago

Interesting to note -- though RDKit and CDKDepict are developed independently of each other -- CDKDepict's numbering might be conceptually related to RDKit's AdjacencyMatrix mentioned here because of the matching sequence atoms labeled by RDKit and CDKDepict for CCC(C(O)C)CN and OC(COC1=C2C=CC=CC2=CC=C1)CNC(C)C (mentioned here) though ones' index starts by zero, the other by one:

CDKDepict_RDKit

johnmay commented 2 years ago

It is not possible to relabel atoms like this, and is a bit out of scope of what depict is designed to do. The "Atom Numbers" option displays the input atom index (in the SMILES string). For example

CCO (oxygen is atom 3 - idx=2)
OCC (oxygen is atom 1 - idx=0)

If you want a different labelling you can use the "Atom Value" option which allows you attach arbitrary labels to atoms, alternately you can use atom maps. Here is an example:

CN1C(N(C=2N=CN(C2C1=O)C)C)=O |$_AV:1;1;2;3;4;9;8;7;5;6;O';1'';1';O$| 1,3,7-trimethyl-3,7-dihydro-1H-purine-2,6-dione

C)%3DO%20%7C%24_AV%3A1%3B1%3B2%3B3%3B4%3B9%3B8%3B7%3B5%3B6%3BO%27%3B1%27%27%3B1%27%3BO%24%7C&w=-1&h=-1&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.25&annotate=atomvalue&r=0)

These labels were actually automatically assigned by OPSIN coming from the name:

String name = "1,3,7-trimethyl-3,7-dihydro-1H-purine-2,6-dione";
System.err.println(NameToStructure.getInstance()
                                  .parseChemicalName(name)
                                  .getExtendedSmiles());

Note this highlights that locants are not globally unique when you have substituents and hence why there isn't really such a thing as "IUPAC numbering" outside of the core ring system.

If someone had an algorithm to add these on to a given SMILES (possibly using OPSIN's dictionaries) then such an option might be reasonable:

CN1C(N(C=2N=CN(C2C1=O)C)C)=O |$_AV:;1;2;3;4;9;8;7;5;6;;;;$|

C)%3DO%20%7C%24_AV%3A%3B1%3B2%3B3%3B4%3B9%3B8%3B7%3B5%3B6%3B%3B%3B%3B%24%7C&w=-1&h=-1&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.25&annotate=atomvalue&r=0)

There is no correlation to numbering with RDKit's number other than that display probably used the same input SMILES string and therefore the same numbers come out since it's the input ordering.

Please let me know if you need anything else.

biotech7 commented 2 years ago

Thanks, John! This issue puzzled me for quite a while. In terms of "IUPAC labelling", generating labels according to atomvalues may be a good choice through OPSIN extended SMILES. Of curiosity,in the depict examples,could it be possible to add string downside/ upside of arrow, like this: test

johnmay commented 2 years ago

This is not supported in SMILES natively but you can in Data sgroup. You have to be careful with escaping some characters:

CCO.[CH3:1][C:2](=[O:3])[OH:4]>[H+]>CC[O:4][C:2](=[O:3])[CH3:1].O |SgD::cdk:ReactionConditions:0~RT| Ethyl esterification [1.7.3]