doyle-lab-ucla / auto-qchem

Auto-QChem is an automated workflow for the generation and storage of DFT calculations for organic molecules.
https://doyle-lab-ucla.github.io/auto-qchem/
GNU General Public License v3.0
88 stars 18 forks source link

DB Descriptors: duplicate molecules results in sum of molecule descriptors not Boltzmann average #8

Open samellis opened 2 years ago

samellis commented 2 years ago

When duplicate molecules exist, downloading Boltzmann average descriptors results in the sum of both molecule's descriptors being downloaded as a single molecule.

For example, a search for triethylamine SMILES: CCN(CC)CC results in two molecules: https://autoqchem.org/?tag=ALL&solvent=ALL&functional=ALL&basis_set=ALL&substructure=&smiles=CCN%28CC%29CC

Downloading the global descriptors of these as Boltzmann average gives a single molecule where the descriptors are the sum of both molecules rather than an averaged value. For example: number_of_atoms is 44, not 22, E is -584.151 the sum of -291.95 and -292.2.

Please feel free to correct me if my usage is wrong or this is expected functionality. Fantastic work by the way!

zuranski commented 2 years ago

Confirmed. It's an issue with indexing in case of smiles overlap. I will investigate the fix.