Open maxbernhard opened 6 years ago
Hello, do you figure out this question? Many thanks.
Hi All,
It sounds like you might like to do semi-supervised learning.
That is not currently supported by this repo, but you might be able to adapt it for your purposes. I would suggest looking at this paper for some ideas: https://arxiv.org/abs/1805.00108
Best, Jennifer
On Fri, Jun 28, 2019 at 9:35 AM wkl000 notifications@github.com wrote:
Hello, do you figure out this question? Many thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aspuru-2Dguzik-2Dgroup_chemical-5Fvae_issues_5-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DADT3XUHH4DLN5OT3IFWU2XLP4YHSLA5CNFSM4E3BGN72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY2C5HA-23issuecomment-2D506736284&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=qQfuQlGjhPcCvD22MpEG4HdqWm4ydziJI9XdkfoIEt4&s=gA8sbh69Qb1m6rNGN5AFpTIYFzWRID3tHKVAFtVc2e0&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADT3XUGPYIARAV4WW3AAAPDP4YHSLANCNFSM4E3BGN7Q&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=qQfuQlGjhPcCvD22MpEG4HdqWm4ydziJI9XdkfoIEt4&s=miVhcpk8csZyND4dXeg_YPXWVUwaQRTEYI-cpRWUlfI&e= .
Hi all, I also encounter the same problem, so I modified the code by myself. Please take it for your reference. After finishing training the VAE itself by a large number of Molecules with SMILES and without properties, I create a model which is composed of only the trained encoder and the property predictor and subsequently train the model by a limited number of Molecules with SMILES and properties.
the whole parameter setting https://github.com/AustinApple/modified_chemvae/blob/master/exp_property_training.json
In order to train the property predictor separately, I made some modifications in thetrain_vae.py
https://github.com/AustinApple/modified_chemvae/blob/master/train_prop.py
However, in this way we need to freeze the weighting of encoder to train our property predictor or we will destroy the system of the trained auto-encoder. We can expect that the performance of property predictor would be worse than one training with encoder together. If there is any question. please let me know.
One problem I foresee is that if you don't train jointly the VAE with the property predictor, then the latent-space will not be organized according to the property. So, you are doing something quite different from what is reported in their paper.
Dear All,
In the paper it is mentioned that 250 000 drug-like molecules were used to train the autoencoder-system And for training the Gaussian-process 2000 Molecules were used.
However the provided command line tool only provides one input.
Therefore the question: Is it possible to train the property-prediction and reconstruction-task separately, or how was this separated training achieved in the paper?
How should the code be executed, assuming we have the following two data sets?: • a limited number of Molecules with SMILES and properties and • a large number of Molecules with SMILES and without properties, (maybe overlapping with the smaller dataset)
Kind regards!