aspuru-guzik-group / chemical_vae

Code for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow
Apache License 2.0
470 stars 178 forks source link

Cannot reproduce the h5 files of the zinc_properties example #25

Open sozenoid opened 4 years ago

sozenoid commented 4 years ago

Hello, I've been trying to reproduce the results of the zinc_properties provided in the default repositories ./chemical_vae/models/zinc_properties.

Basically I just cd to the zinc_properties directory and use python3 -m chemvae.train_vae for 120 epoch with the default exp.json file and end up with the three files zinc_decoder.h5, zinc_encoder.h5, zinc_prop_pred.h5.

Now if I try to use those files in the jupyter notebook /chemical_vae/examples/intro_to_chemvae.ipynb example, the "encode then decode test" as shown below does not work (cannot find back the original smiles encoded nor generate similar smiles using a noise of 5.0) though it all does work with the original h5 files.

# Using the VAE
## Decode/Encode 

smiles_1 = mu.canon_smiles('CSCC(=O)NNC(=O)c1c(C)oc(C)c1C')
# smiles_1 = mu.canon_smiles('Cc1cc2c(cc1S(=O)(=O)NC1CCC(C)CC1)OCCN2C')

X_1 = vae.smiles_to_hot(smiles_1,canonize_smiles=True)
z_1 = vae.encode(X_1)
X_r= vae.decode(z_1)

print('{:20s} : {}'.format('Input',smiles_1))
print('{:20s} : {}'.format('Reconstruction',vae.hot_to_smiles(X_r,strip=True)[0]))

print('{:20s} : {} with norm {:.3f}'.format('Z representation',z_1.shape, np.linalg.norm(z_1)))

Were the h5 files provided obtained using the .csv and .json files provided in the same zinc_properties github repository?

Thank you very much for your work, it is so interesting Best Regards Hugues

sozenoid commented 4 years ago

Removing the "limit_data" field in the exp.json seems to go a long way to improve the results.