Closed jgiaccai closed 1 year ago
Hi Jennifer,
Sorry for the delay. I haven't used the ML estimator yet, but I'll try to help. Could you try replacing code between 1753-1760 in RMG-Py/rmgpy/data/thermo.py
with the following? I believe that should help improve some of the expected behavior, such as printing the molecule causing the error while not crashing the script.
try:
if molecule.is_radical():
thermo = [self.estimate_radical_thermo_via_hbi(mol, ml_estimator.get_thermo_data) for mol in species.molecule]
H298 = np.array([tdata.H298.value_si for tdata in thermo])
indices = H298.argsort()
species.molecule = [species.molecule[ind] for ind in indices]
thermo0 = thermo[indices[0]]
else:
thermo0 = ml_estimator.get_thermo_data_for_species(species)
except Exception as e:
# if rdkit throws a Keukulize Exception, print the error and the molecule, and just return None.
print(f'\n\nError: Could not obtain thermo for {species.label} due to the following error: \n{e}\n\n')
return None
This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.
Topic
General area which your question is related to.
Context
I have been successfully using the Thermo module to estimate thermo properties of large PAH molecules using group additivity. We've been getting some unexpected results, and after reading in the online documentation that I may be better off using machine learning instead. When submitting molecule to thermo using machine learning, I get the error below. I think this is likely created because the molecule can't be kekulized.
Question
I know how the group additivity works based on the articles published by Yu et al. Is there any documentation (articles or other) on the machine learning methodology?
Can anyone confirm that the inability to kekulize the molecule is what is leading to the machine learning Thermo module not working?
Bug Description
When running the ML in the Thermo module I get the following error and no output is created.
Traceback (most recent call last): File "../scripts/thermoEstimator.py", line 103, in
run_thermo_estimator(input_file, args.library)
File "../scripts/thermoEstimator.py", line 70, in run_thermo_estimator
submit(species)
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/thermo/thermoengine.py", line 174, in submit
spc.thermo = evaluator(spc, solvent_name=solvent_name)
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/thermo/thermoengine.py", line 159, in evaluator
thermo = generate_thermo_data(spc, solvent_name=solvent_name)
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/thermo/thermoengine.py", line 124, in generate_thermo_data
thermo0 = thermodb.get_thermo_data(spc)
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/data/thermo.py", line 1319, in get_thermo_data
ml_settings)
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/data/thermo.py", line 1760, in get_thermo_data_from_ml
thermo0 = ml_estimator.get_thermo_data_for_species(species)
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/ml/estimator.py", line 111, in get_thermo_data_for_species
return self.get_thermo_data(species.molecule[0])
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/ml/estimator.py", line 79, in get_thermo_data
hf298 = self.hf298_estimator(molecule.smiles)[0][0]
File "/Users/jennifergiaccai/Documents/gradschool/PAHholdingcell/RMG/RMG-Py/rmgpy/ml/estimator.py", line 148, in estimator
[chemprop.data.MoleculeDatapoint(line=[smi], args=args)]
File "/anaconda3/envs/rmg_env/lib/python3.7/site-packages/chemprop-0.0.1-py3.7.egg/chemprop/data/data.py", line 48, in init
File "/anaconda3/envs/rmg_env/lib/python3.7/site-packages/chemprop-0.0.1-py3.7.egg/chemprop/mol_utils.py", line 19, in str_to_mol
rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol. Unkekulized atoms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
How To Reproduce
I have been able to use ML with some sets of molecules submitted, my guess is that it is related to not being able to kekulize the molecule. I'm working with a set of potential large PAH molecules that were generated with another program. It doesn't specify single and double bond location, which may lead to PAH that are chemically unstable.
Expected Behavior
If ML cannot estimate thermo properties for a molecule I would have expected it to skip the molecule and still produce a library with the other molecules that are successful. Or an error message that stated that not being able to kekulize the molecule means ML won't be successful. It would also be helpful to know which molecule it isn't able to kekulize.
Installation Information
Describe your installation method and system information.