gcorso / torsional-diffusion

Implementation of Torsional Diffusion for Molecular Conformer Generation (NeurIPS 2022)
https://arxiv.org/abs/2206.01729
MIT License
247 stars 43 forks source link

training data setup #5

Open sacombs opened 1 year ago

sacombs commented 1 year ago

I would like to provide my own datasest for retraining torisional-diffusion. There are some things that I do not know what value to put in for the pickle file. For example, the conformers dictionary has the following:

{'geom_id': 123368967, 'set': 1, 'degeneracy': 3, 'totalenergy': -23.59133734, 'relativeenergy': 0.0, 'boltzmannweight': 0.8585, 'conformerweights': [0.28617, 0.28617, 0.28616], 'rd_mol': <rdkit.Chem.rdchem.Mol at 0x7f7b42014bd0>}

What should I put for boltzmannweight and degeneracy? Is there a setup script to take molfiles and convert them into the dataset for training?

MatthewMasters commented 1 year ago

Degeneracy is not used in the code so it's safe to exclude. The boltzmann weight can be calculated if you know the energy and temperature since w = exp(-E/kbT) where E=energy, T=temperature, and kb=boltzmann constant.

gcorso commented 1 year ago

Sorry for the delay and thank you very much @MatthewMasters for the answer! All Matthew said is correct, moreover, if you don't use the Boltzmann weighted sampling (this is the way the ML community trains and evaluates these methods) you only need to have the rd_mol!