cdk / depict

SMILES Depiction Generator
GNU Lesser General Public License v2.1
54 stars 14 forks source link

atom numbering based on the InChI #52

Closed egonw closed 1 year ago

egonw commented 1 year ago

I would welcome an option to number the atoms according to the numbering in the InChI. Can I request that, please?

nbehrnd commented 1 year ago

@egonw This question equally could be interesting for the public InChI mailing list (inchi-discuss@lists.sourceforge.net) for that winchi-1.06 (Windows GUI on top of contemporary InChI 1.06 by [2020-12-19 Sun]) may display this:

2022-10-25_example_winchi

while running ./inchi-1 on InChI trust's reference executable for Linux does not yield an obvious toggle/flag to yield such an illustration in either .svg, or .png format.

Source for winchi: download section of InChI trust, entry INCHI-1-BIN.zip – Software binaries. Maybe with INCHI-1-SRC.zip – InChI Software source codes building a similar GUI for other platforms is eased.

johnmay commented 1 year ago

Ah... you know I don't like InChi egon :-)

johnmay commented 1 year ago

Maybe as an optional build, currently it is quite light weight. InChI makes things much larger, I don't think InChI numbers are useful.

Adafede commented 1 year ago

Maybe to help with some additional context on how could InChI numbering be useful:

When you have a structure with an undefined stereocenter, but you are tired and cannot spot which one it is, converting it to InChI gives you the atom number (according to InChI) so then coming back to the depiction with the same numbering could be useful?

Or is there already something allowing this in the CDK?

johnmay commented 1 year ago

How are you inputing the structure?, normally the sketcher will show you this.

Adafede commented 1 year ago

Not sure to entirely understand what you mean.

Starting from a SMILES (let it be CCC(C)C(=O)OC[C@@]12[C@@H](O)[C@H](C[C@@H](C)[C@]11OC(C)(C)[C@@H]([C@H]1OC(C)=O)[C@H](OC(=O)C1=CN(C)C(=O)C=C1)C2=O)OC(=O)C1=CN(C)C(=O)C=C1), how would you highlight the undefined stereocenter for a half-sleeping chemist?

(not sure I wanna pollute the original issue of Egon with this question, tell me if worth opening another one)

egonw commented 1 year ago

I'll see if I can work out some Java/Groovy code first to resemble the above functionality.

johnmay commented 1 year ago

How would a sleepy chemist pair the InChI numbers back to the SMILES string from a depiction?

It would be easy for us mark constitutional stereocenters in depict with a (?) and does not need InChI. The hard part is interdependent stereo but since that's not possible in general we can support the simple case. Please open another issue mark missing stereo for that.

Adafede commented 1 year ago

Taking my original

CCC(C)C(=O)OC[C@@]12[C@@H](O)[C@H](C[C@@H](C)[C@]11OC(C)(C)[C@@H]([C@H]1OC(C)=O)[C@H](OC(=O)C1=CN(C)C(=O)C=C1)C2=O)OC(=O)C1=CN(C)C(=O)C=C1

converting it to InChI leads to

InChI=1S/C36H44N2O13/c1-9-18(2)31(44)47-17-35-28(42)23(49-32(45)21-10-12-24(40)37(7)15-21)14-19(3)36(35)30(48-20(4)39)26(34(5,6)51-36)27(29(35)43)50-33(46)22-11-13-25(41)38(8)16-22/h10-13,15-16,18-19,23,26-28,30,42H,9,14,17H2,1-8H3/t18?,19-,23+,26-,27+,28+,30-,35+,36-/m1/s1`

where we can see 18 ?, so we know it is atom 18 according to InChI. We could then trace it back to the depiction if atoms were InChI-numbered? Anyway, opened #54

egonw commented 1 year ago

@Adafede, John and I discussed the recognition of missing stereo, and conclusion was to use the CDK for that.

johnmay commented 1 year ago

OK so now which Atom in your SMILES string is 18? I feel this is a case of "the XY problem" :-)

egonw commented 1 year ago

John, if you number atoms in the depiction with the InChI atom numbers, that would be clear, not?

Adafede commented 1 year ago

I was about to answer something similar. I do not want to see which string in my SMILES is undefined, but which carbon is on my depiction. With #54, even more straightforward, I would not need InChI indeed. Probably relevant for other applications still?

johnmay commented 1 year ago

Fixed

image

johnmay commented 1 year ago

Sorry not fixed... issue was conflated.

johnmay commented 1 year ago

Closing - won't fix.

My reservations are:

Here is the trivial code to do it via the CDK:

long[] numbers = InChINumbersTools.getNumbers(atomContainer);
for (IAtom atom : molecule.atoms()) {
   atom.setProperty(StandardGenerator.ANNOTATION_LABEL,
                    Long.toString(numbers[atom.getIndex()]));
}
long[] numbers = InChINumbersTools.getNumbers(mol);
for (IAtom atom : mol.atoms()) {
    // atom.setProperty(CDKConstants.ATOM_ATOM_MAPPING, (int)numbers[atom.getIndex()]);
    atom.setProperty(CDKConstants.COMMENT, (int)numbers[atom.getIndex()]);
}
SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Default + SmiFlavor.AtomAtomMap);
System.out.println(smigen.create(mol));
CN1C=NC2=C1C(=O)N(C(=O)N2C)C |$_AV:1;10;4;9;6;5;7;13;12;8;14;11;2;3$|

Perhaps a compromise is we add method to the CDK library depiction generator? The cdk-inchi would then be an optional dependency?

new DepictionGenerator().withAtomNumbers().depict(mol); // current
new DepictionGenerator().withAtomValues().depict(mol); // current
new DepictionGenerator().withInChINumbers().depict(mol); // addition (requires cdk-inchi)