Goodman-lab / DP5

Python workflow for DP5 and DP4 analysis of organic molecules
Other
175 stars 102 forks source link

SMARTS Parse Error #65

Open BioGavin opened 2 years ago

BioGavin commented 2 years ago

When running to the sdftinkerxyzpy.py script, a SMARTS parsing error occurs.

SMARTS Parse Error: syntax error while parsing: HN#[C,N]
SMARTS Parse Error: Failed parsing SMARTS 'HN#[C,N]' for input: 'HN#[C,N]'
...
Traceback (most recent call last):
  File "tett.py", line 130, in <module>
    for substructure_match in m.GetSubstructMatches(substructure):
Boost.Python.ArgumentError: Python argument types in
    Mol.GetSubstructMatches(Mol, NoneType)
did not match C++ signature:
    GetSubstructMatches(RDKit::ROMol self, RDKit::MolBundle query, RDKit::SubstructMatchParameters params)
    GetSubstructMatches(RDKit::ROMol self, RDKit::ROMol query, RDKit::SubstructMatchParameters params)
    GetSubstructMatches(RDKit::ROMol self, RDKit::MolBundle query, bool uniquify=True, bool useChirality=False, bool useQueryQueryMatches=False, unsigned int maxMatches=1000)
    GetSubstructMatches(RDKit::ROMol self, RDKit::ROMol query, bool uniquify=True, bool useChirality=False, bool useQueryQueryMatches=False, unsigned int maxMatches=1000)

After some exploration, I found the cause of the problem. 'H' is unknown and an invalid symbol in SMARTS. When I check the code, I find that

 m = Chem.MolFromMolFile(sdf_file + ".sdf", removeHs=False)  # sdftinkerxyzpy.py line-360

If set removeHs=True, problem will be solved.

amyjystad commented 1 year ago

Having the same issue a year later, looks like theres a typo in line 258 of the sdftinkerxyzpy.py script, changing HN#[C,N] to [H]N#[C,N]. Has thus far fixed the parsing issue.