Meredith-Lab / volcalc

volcalc: Calculate Volatility of Chemical Compounds
https://meredith-lab.github.io/volcalc/
Other
4 stars 1 forks source link

Check for invalid SDFs #85

Closed Aariq closed 4 months ago

Aariq commented 10 months ago

In calc_vol() after mol files or SMILES are converted to SDF, check if any were invalid using ChemmineR::validSDF(), then warn the user and return NA for those inputs.

Aariq commented 9 months ago

Unfortunately, validSDF() does not check very many things and will let a lot through that still generate warnings or errors from, e.g., propOB(). #56 might be a better thing to pursue

Aariq commented 6 months ago

The callr approach implemented with safe_propOB() in #96 is working, but has some major downsides.

  1. It only works when ChemmineR/OpenBabel produce errors. E.g. .mol files that produce errors for propOB() just return zeroes when passed to ChemmineR::groups().
  2. It slows things down a lot. For 92 molecules, propOB() takes 0.7 seconds and safe_propOB() takes about 4 minutes. I don't think there is any chance of speeding this up.

Alternatives

  1. Allow users to choose between regular propOB() and safe_propOB() with an argument to calc_vol() like catch_OB_err = TRUE or something.

  2. Validate the output of propOB() (and possibly other functions) "manually" by looking for known issues and replacing all values with NA if they are found.

    • SMILES with * character.
    • Missing values (NA or "") for SMILES, InChI, or formula, indicating there were problems generating those

Might be worthwhile to spend some time purposefully trying to break .mol files or SMILES to get a better test set

Aariq commented 6 months ago

Decided to introduce a validate argument to calc_vol() (that is passed to get_fx_groups()) that checks for the symptoms of an OpenBabel error such as empty strings for InChI. If those symptoms are found, all values are replaced with NA and a warning is printed, possibly add a logical column valid as well.