aspuru-guzik-group / chemical_vae

Code for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow
Apache License 2.0
489 stars 183 forks source link

How to train your VAE #7

Open xuzhang5788 opened 6 years ago

xuzhang5788 commented 6 years ago

If I have a new dataset, how can I use your code to train? It will be great if you could provide a procedure. Do you have any documentations on this code? Many thanks.

jnwei-zz commented 6 years ago

Hi,

I'll write up a more thorough procedure and fix some documentation later, but to answer your question quickly:

  1. Have your data prepared as a csv file. One of these columns should contain SMILES, the other should contain the properties you want to predict. This will be read later by mol_utils.load_smiles_and_data_df
  2. Create a json containing the characters found in the SMILES strings of this file. mol_utils.make_charset can help you do that.
  3. Copy the exp.json from models>zinc>exp.json, and change the data_file, char_file fields to match your experiment.
  4. Run chem.train_vae from the directory containing the exp.json.

Let me know if you have more questions.

On Thu, May 17, 2018 at 7:59 PM xuzhang5788 notifications@github.com wrote:

If I have a new dataset, how can I use your code to train? It will be great if you could provide a procedure. Do you have any documentations on this code? Many thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aspuru-2Dguzik-2Dgroup_chemical-5Fvae_issues_7&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=1VMbMCU2_EA69_JC4hXsIcBbk9UHRgy8Kvr0DIJj0mQ&s=c-YzYQ-5EFmeQvj57gY59sOwYCotUptx6tSlGTfCR8o&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOe70FGC51S-2DuXIbAJ1NBbL1Srv4HRz7ks5tzg7lgaJpZM4UD8Eu&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=1VMbMCU2_EA69_JC4hXsIcBbk9UHRgy8Kvr0DIJj0mQ&s=CglKMTcnL6nGT0QtrKYuWptMdsMUgLW8ohFGUi3fJSM&e= .

zealseeker commented 6 years ago

Hi @jnwei I wonder is it possible to first feed a large dataset such as ZINC without the properties and then feed a much small one with the property such as the bioacitivity to a target. So that I can generate compounds with the bioactivity?

update

It seems that the issue has been created in #5, however no one replies.

muammar commented 2 years ago

Hi @jnwei I wonder is it possible to first feed a large dataset such as ZINC without the properties and then feed a much small one with the property such as the bioacitivity to a target. So that I can generate compounds with the bioactivity?

update

It seems that the issue has been created in #5, however no one replies.

Did you get to solve this?