FelixBaensch / MORTAR

MOlecule fRagmenTAtion fRamework
MIT License
18 stars 3 forks source link

Molecule with "query bonds" not imported without adding implicit hydrogens #36

Closed FelixBaensch closed 4 months ago

FelixBaensch commented 6 months ago
  1. Preferences "Add implicit hydrogens at import setting" is false
  2. Import Mol/SD file with query bonds (MDL bond type > 3)

    Benzene
    387159399
    -OEChem-02152404312D

    6 6 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 0 0 0 0 2 3 4 0 0 0 0 3 4 4 0 0 0 0 4 5 4 0 0 0 0 5 6 4 0 0 0 0 1 6 4 0 0 0 0 M END

=> "No content in table"

FelixBaensch commented 4 months ago

It looks like the internal handling of the structures as SMILES is causing problems:

BenzeneQueryBonds
387159399
-OEChem-02152404312D

6 6 0 0 0 0 0 0 0 0999 V2000 -0.2009 1.8964 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.9153 1.4839 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.9153 0.6588 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2009 0.2463 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5135 0.6588 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5135 1.4839 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 7 0 0 0 0 2 3 7 0 0 0 0 3 4 7 0 0 0 0 4 5 7 0 0 0 0 5 6 7 0 0 0 0 6 1 7 0 0 0 0 M END

=> see log file

Log file
Apr 05, 2024 7:39:30 AM de.unijena.cheminf.mortar.main.MainApp start INFO: MORTAR 1.1.1.0 session start Apr 05, 2024 7:39:30 AM de.unijena.cheminf.mortar.main.MainApp start INFO: Started with Java version 21.0.1. Apr 05, 2024 7:39:42 AM de.unijena.cheminf.mortar.model.io.Importer preprocessMoleculeSet INFO: Imported and preprocessed molecule set. 0 exceptions occurred while processing. Apr 05, 2024 7:39:42 AM de.unijena.cheminf.mortar.model.util.ChemUtil createUniqueSmiles SEVERE: org.openscience.cdk.exception.CDKException: A bond had undefined order, possible query bond?; molecule name: benzene_left org.openscience.cdk.exception.CDKException: A bond had undefined order, possible query bond? at org.openscience.cdk.smiles.CDKToBeam.toBeamEdgeLabel(CDKToBeam.java:280) at org.openscience.cdk.smiles.CDKToBeam.toBeamEdge(CDKToBeam.java:260) at org.openscience.cdk.smiles.CDKToBeam.toBeamGraph(CDKToBeam.java:143) at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:445) at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:401) at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:362) at de.unijena.cheminf.mortar.model.util.ChemUtil.createUniqueSmiles(ChemUtil.java:97) at de.unijena.cheminf.mortar.controller.MainViewController.lambda$importMoleculeFile$22(MainViewController.java:495) at com.sun.javafx.application.PlatformImpl.lambda$runLater$10(PlatformImpl.java:456) at java.base/java.security.AccessController.doPrivileged(AccessController.java:400) at com.sun.javafx.application.PlatformImpl.lambda$runLater$11(PlatformImpl.java:455) at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95) at com.sun.glass.ui.gtk.GtkApplication._runLoop(Native Method) at com.sun.glass.ui.gtk.GtkApplication.lambda$runLoop$10(GtkApplication.java:263) at java.base/java.lang.Thread.run(Thread.java:1583) Apr 05, 2024 7:39:42 AM de.unijena.cheminf.mortar.controller.MainViewController lambda$importMoleculeFile$22 WARNING: Import failed, set of imported molecules is null or empty Apr 05, 2024 7:39:44 AM de.unijena.cheminf.mortar.controller.MainViewController closeApplication INFO: MORTAR session end

benzene_querybonds.mol.txt

MORTAR_Log_2024_04_05_07_39.txt

FelixBaensch commented 4 months ago

And btw how important is stereochemistry for us here? Should we switch to isomeric SMILES?

JonasSchaub commented 4 months ago

See the CDK MDLV200Reader code on bond orders: https://github.com/cdk/cdk/blob/99548a9001ab0be1c5ad437d802f511b91d74210/storage/ctab/src/main/java/org/openscience/cdk/io/MDLV2000Reader.java#L837

For order 4, it assigns a "normal" bond with unset order and aromatic flag. For any order given in the MOL file higher than that, it assigns a query bond which cannot be kekulized and cannot be dealt with by the SmilesGenerator, apparently.

What do you wish to accomplish with this non-standard bond order 7 here? The only solution I see ad hoc is to iterate through all imported molecule bonds and replace query bonds with "something proper". Not very ideal...

Your original issue was regarding bond order 4 in MOL files. Is this correctly handled now?

JonasSchaub commented 4 months ago

And btw how important is stereochemistry? Should we switch to isomeric SMILES?

https://github.com/FelixBaensch/MORTAR/issues/48

FelixBaensch commented 4 months ago

See the CDK MDLV200Reader code on bond orders: https://github.com/cdk/cdk/blob/99548a9001ab0be1c5ad437d802f511b91d74210/storage/ctab/src/main/java/org/openscience/cdk/io/MDLV2000Reader.java#L837

For order 4, it assigns a "normal" bond with unset order and aromatic flag. For any order given in the MOL file higher than that, it assigns a query bond which cannot be kekulized and cannot be dealt with by the SmilesGenerator, apparently.

Thanks for the clarification. That was exactly my assumption. This follows the MDL bond types.

What do you wish to accomplish with this non-standard bond order 7 here? The only solution I see ad hoc is to iterate through all imported molecule bonds and replace query bonds with "something proper". Not very ideal...

Nothing, we just need to keep that in mind and maybe mention it in the readme or something.

Your original issue was regarding bond order 4 in MOL files. Is this correctly handled now?

My original issue was regarding bond order > 3 in mol files. I think we have to discuss this next week

JonasSchaub commented 4 months ago

Conclusion: comment in tutorial or elsewhere that MDL MOL file bond types <= 4 are valid. Bond orders higher than that are not parseable into SMILES. Therefore illegal for our data model.

JonasSchaub commented 4 months ago

Fun fact, we already have a section about this in the tutorial:

https://docs.google.com/document/d/1dbpSZfdmOYaQqdYTDZj0NfKuy9TMp6T4SF6lCBB7bFo/edit#heading=h.crleasfiafc0

@FelixBaensch do you think this is enough?

image