Closed FelixBaensch closed 4 months ago
It looks like the internal handling of the structures as SMILES is causing problems:
=> see log file
And btw how important is stereochemistry for us here? Should we switch to isomeric SMILES?
See the CDK MDLV200Reader code on bond orders: https://github.com/cdk/cdk/blob/99548a9001ab0be1c5ad437d802f511b91d74210/storage/ctab/src/main/java/org/openscience/cdk/io/MDLV2000Reader.java#L837
For order 4, it assigns a "normal" bond with unset order and aromatic flag. For any order given in the MOL file higher than that, it assigns a query bond which cannot be kekulized and cannot be dealt with by the SmilesGenerator, apparently.
What do you wish to accomplish with this non-standard bond order 7 here? The only solution I see ad hoc is to iterate through all imported molecule bonds and replace query bonds with "something proper". Not very ideal...
Your original issue was regarding bond order 4 in MOL files. Is this correctly handled now?
And btw how important is stereochemistry? Should we switch to isomeric SMILES?
See the CDK MDLV200Reader code on bond orders: https://github.com/cdk/cdk/blob/99548a9001ab0be1c5ad437d802f511b91d74210/storage/ctab/src/main/java/org/openscience/cdk/io/MDLV2000Reader.java#L837
For order 4, it assigns a "normal" bond with unset order and aromatic flag. For any order given in the MOL file higher than that, it assigns a query bond which cannot be kekulized and cannot be dealt with by the SmilesGenerator, apparently.
Thanks for the clarification. That was exactly my assumption. This follows the MDL bond types.
What do you wish to accomplish with this non-standard bond order 7 here? The only solution I see ad hoc is to iterate through all imported molecule bonds and replace query bonds with "something proper". Not very ideal...
Nothing, we just need to keep that in mind and maybe mention it in the readme or something.
Your original issue was regarding bond order 4 in MOL files. Is this correctly handled now?
My original issue was regarding bond order > 3 in mol files. I think we have to discuss this next week
Conclusion: comment in tutorial or elsewhere that MDL MOL file bond types <= 4 are valid. Bond orders higher than that are not parseable into SMILES. Therefore illegal for our data model.
Fun fact, we already have a section about this in the tutorial:
@FelixBaensch do you think this is enough?
Import Mol/SD file with query bonds (MDL bond type > 3)
Benzene
387159399
-OEChem-02152404312D
6 6 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 0 0 0 0 2 3 4 0 0 0 0 3 4 4 0 0 0 0 4 5 4 0 0 0 0 5 6 4 0 0 0 0 1 6 4 0 0 0 0 M END
=> "No content in table"