Meredith-Lab / volcalc

volcalc: Calculate Volatility of Chemical Compounds
https://meredith-lab.github.io/volcalc/
Other
4 stars 1 forks source link

Capture OpenBabel errors and add to output of `calc_vol()` #56

Closed Aariq closed 4 months ago

Aariq commented 1 year ago

Warnings and errors from OpenBabel are unfortunately difficult to capture, but it would be nice if there was some indication of parsing problems in the output of calc_vol(). Either that or just turning the OpenBabel errors into R errors and OpenBabel warnings into R warnings.

An example:

> path <- tempdir()
> kegg <- get_mol_kegg("C01209", dir = path)
> calc_vol(kegg$mol_path)
==============================
*** Open Babel Warning  in InChI code
  Malonyl-[acyl-carrier protein] :Unknown element(s): *
==============================
*** Open Babel Error  in InChI code
  InChI generation failed
==============================
*** Open Babel Warning  in InChI code
  Malonyl-[acyl-carrier protein] :Unknown element(s): *
==============================
*** Open Babel Error  in InChI code
  InChI generation failed
==============================
*** Open Babel Warning  in InChI code
  Malonyl-[acyl-carrier protein] :Unknown element(s): *
==============================
*** Open Babel Error  in InChI code
  InChI generation failed
# A tibble: 1 × 5
  mol_path                         formula name                           volatility category
  <chr>                            <chr>   <chr>                               <dbl> <chr>   
1 /var/...C01209.mol               C3H3O3S Malonyl-[acyl-carrier protein]       6.59 high    
Aariq commented 1 year ago

These messages seem to come primarily from ChemmineR::propOB()

Aariq commented 1 year ago

Conversation on how to capture OpenBabel errors: https://github.com/girke-lab/ChemmineR/issues/14

Aariq commented 9 months ago

A good example of where this gives misleading results is with phosphatidylcholine. The structure on KEGG has "R" groups for the fatty acid chain that OpenBabel can't parse. So it gets read in as only having 10 carbons and the volatility is way overestimated putting it in the "high" category. This really should result in NAs across the board for functional groups and calculations.