cdk / depict

SMILES Depiction Generator
GNU Lesser General Public License v2.1
54 stars 14 forks source link

I encountered the following 4 questions in the use of CDK to generate molecular pictures #23

Closed miracle1111111 closed 4 years ago

miracle1111111 commented 4 years ago

Hello author, I encountered the following 4 questions in the use of CDK to generate molecular pictures, I hope to get your answer, thank you. QQ截图20200716201642 I want to know how to generate a wavy line pattern as shown in the right half of the figure above when there are 3 C atoms connected to the ring structure. QQ截图20200716201858 In addition, regarding the molecular short form, there are no pictures in StandardAtomGenerator.generatePseudoSymbol() that can generate images similar to CH, CH2, CH3, etc. QQ截图20200716201923 For those that contain * in smiles, the CDK is generated according to the surface meaning. In fact, some refer to A, W, X, H, etc. I want to know how to set up to generate a molecular diagram like the right half of the above picture. QQ截图20200716204305

Finally abrv.setEnabled("Me", false);//Maybe we don't want'Me' in the depiction but the generated picture still contains "Me". why

johnmay commented 4 years ago

It would have been helpful if you split these into separate issues/support.

1) In code you set it as an Attachment Point number on the pseudo atom, in CXSMILES:*c1ccccc1 |$_AP1$|, PseudoAtom.setAttachmentPoint number. 2) API point is set carbon visibility (DepictionGenerator.withCarbonSymbols()) - however that example is mixed with the benzenes you would need something custom. Explained better here: https://github.com/cdk/cdk/wiki/Standard-Generator#symbol-visibility 3) If you have given in the correct input it will be correct. CXSMILES: *c1ccccc1 |$A$|

C*.C*.C1=CC=CC=C1C=2C(C=CN3C2C=*C(=*3)C**)=O.C* |$;R2;;R3;;;;;;;;;;;;;;W;;A;;X;R4;;;R1$,m:0:5.4.9.8.7.6,2:7.6.5.4.9.8,24:4.9.8.7.6.5,Sg:n:20:m:ht| US 2007/0129372 (I)

%3DO.C*%20%7C%24%3BR2%3B%3BR3%3B%3B%3B%3B%3B%3B%3B%3B%3B%3B%3B%3B%3B%3BW%3B%3BA%3B%3BX%3BR4%3B%3B%3BR1%24%2Cm%3A0%3A5.4.9.8.7.6%2C2%3A7.6.5.4.9.8%2C24%3A4.9.8.7.6.5%2CSg%3An%3A20%3Am%3Aht%7C%20US%202007%2F0129372%20(I)&w=80&h=50&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none)

4) Please can you attach your code in text rather than a picture and will see what is going on.

johnmay commented 4 years ago

4) might be you need abrv.setContractOnHetero(false);

miracle1111111 commented 4 years ago

Thank you very much for your reply, I have the following 3 questions:

Question 1. in CXSMILES:*c1ccccc1 |$_AP1$|, PseudoAtom.setAttachmentPoint number. Can you give me an example, such as CC(C)C1=C(C(=C(C(=C1[R6])[R7])[R8])[R9])[R10] image

C1CN(CC(=O)N(C1)[R11])[R] image Question 2. Explained better here as you said: https://github.com/cdk/cdk/wiki/Standard-Generator#symbol-visibility CH3 can be generated, however, CH, CH2 do not know how to generate. image

Question 3. For example, in the figure below, given smiles: B[], Chemdraw generates A and D from . I don't know why, generally * refers to a specific atom. QQ截图20200717105701

johnmay commented 4 years ago

Question 1

In your first case it was encoded as iso-propyl, this is incorrect but common misconception. The attachment-point indication isn't formed from wavy bonds but is rather something explicitly for attachments.

Here's a screen shot from some sketch processing I did. https://www.nextmovesoftware.com/products/Praline_Sheffield2016.pdf

image

To input these I used CXSMILES: https://docs.chemaxon.com/display/docs/ChemAxon_Extended_SMILES_and_SMARTS_-_CXSMILES_and_CXSMARTS.html

*C1=C(C(=C(C(=C1[R6])[R7])[R8])[R9])[R10] |$_AP1$|

%5BR8%5D)%5BR9%5D)%5BR10%5D%20%7C%24_AP1%24%7C&w=80&h=50&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none)

Actually the more correct way to encode this is:

*C1=C(C(=C(C(=C1*)*)*)*)* |$_AP1;;;;;;;R6;R7;R8;R9;R10$|

supporting [R1] isn't official in SMILES.

Here's your other one

C1CN(CC(=O)N(C1)[R11])* |$;;;;;;;;;_AP1$|

*%20%7C%24%3B%3B%3B%3B%3B%3B%3B%3B%3B_AP1%24%7C&w=80&h=50&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none)

Here's how you can do this progamatically:

SmilesParser   smipar = new SmilesParser(SilentChemObjectBuilder.getInstance());
IAtomContainer mol    = smipar.parseSmiles("*CCO");
((IPseudoAtom)mol.getAtom(0)).setAttachPointNum(1); // set _AP1
new DepictionGenerator().depict(mol).writeTo("/tmp/tmp.svg");

Question 2

CH3 can be generated, however, CH, CH2 do not know how to generate.

You can give it custom function:

    SmilesParser   smipar = new SmilesParser(SilentChemObjectBuilder.getInstance());
    IAtomContainer mol    = smipar.parseSmiles("*CCO");
    DepictionGenerator depictgen = new DepictionGenerator();
    depictgen = depictgen.withParam(StandardGenerator.Visibility.class,
                                    new SymbolVisibility() {
                                      @Override
                                      public boolean visible(IAtom atom, List<IBond> neighbors, RendererModel model) {
                                        return atom.getAtomicNumber() != 6 || atom.getIndex() == 1;
                                      }
                                    });
    depictgen.depict(mol).writeTo("/tmp/tmp.svg");

image

depictgen = depictgen.withParam(StandardGenerator.Visibility.class,
                                    new SymbolVisibility() {
                                      @Override
                                      public boolean visible(IAtom atom, List<IBond> neighbors, RendererModel model) {
                                        return true; // everything
                                      }
                                    });

image

Question 3

In CDK is , not A. This is a display preference if you want you could change them all to the be this:

    for (IAtom atom : mol.atoms()) {
      // all unlabelled * => A
      if (atom instanceof IPseudoAtom &&
          ((IPseudoAtom) atom).getLabel().equals("*") &&
          ((IPseudoAtom) atom).getAttachPointNum() == 0) {
        ((IPseudoAtom) atom).setLabel("A");
      }
    }

image

Adding an option for D for [2H] is certainly useful though