dnguyen1196 commented 4 years ago

Generative model

Generative model implementation: https://github.com/choderalab/pinot/blob/master/pinot/generative/torch_gvae/model.py

Encoder: One input graph g with node features. For each node feature:

A linear layer maps the node feature from 117 dimensions on a hidden_dim1 -dimension vector
A graph convolution layer maps the transformed node feature from 1 on a hidden_dim2 vector
Two separate graph convolution layers. One maps the output from 2 on a hidden_dim3 vector. This will output mu , the mean of the approximate posterior distribution over latent node representations. The other maps the output from 2 on a hidden_dim3 vector. This will output logvar , the log variance of the approximate posterior distribution over latent node representations.

Decoder: Two separate decoder is used:

An inner product decoder reconstructs a "soft" adjacency matrix A'. "Soft" here means that A'_ij is the predicted probability that there is an edge between nodes i and j. It uses the output from 3. It computes A'_ij = z_i^T z_j
A linear decoder maps the output from 3 into a num_atom_types vector and apply soft max. This output is used for node type classification.

Loss function is the negative ELBO. It is composed of the expected log likelihood term and the KL divergence. The KL term is as usual. The log likelihood term is composed of two terms. One term is binary_cross_entropy between the true adjaceny matrix and the output of 4. The other is binary_cross_entropy between the true node class (in 1 hot-vector) and the output of 5.

Loss function: https://github.com/choderalab/pinot/blob/master/pinot/generative/torch_gvae/loss.py

Data

For these experiments, I used esol which has about 1100 molecules. I did a 0.9 training and 0.1 testing split.

Metrics

Experiment: https://github.com/choderalab/pinot/blob/master/scripts/generative/gvae_exp.py

Right now, the metrics I have implemented and used are: true positive rate for edge prediction, true negative rate for edge prediction and accuracy for node classification.

Hyper-parameters

The hyper-parameters I focused on when I did these experiments are: hidden_dim1, hidden_dim2, hidden_dim3, number of epochs and batch size. Not really knowing a good place to start so I tried a large combination of hidden dimensions where each hidden dimension is one of [256, 128, 64]. Batch size is one of 10, 25, 50, 100. The number of training epochs is 100 or 200.

Step-size: 0.001

Some observations

Most of the experiments produce this kind of results in the table. This picture is taken for hidden dimensions [256, 256, 256], batch size 100 and 200 epochs.

The true positive and true negative rate for edge prediction is about 0.5. So the model does predict some edges as present and some as absent but is just really bad at it. The accuracy for node classification is about 0.75 (It is because the model always predicts the atom type is Carbon)

Screenshot from 2020-05-28 15-57-55

Larger batch-size seems to stabilize the training more although the final metrics are almost the same. This is if one only pays attention to the training true negative/true positive rate for edge prediction graph. Compared to the above experiment which has batch size 100, the following two example experiments use batch sizes 25 and 10.

For example, this is for hidden_dimensions =[64,64,64] with batch_size=25 and n_epochs=100

Screenshot from 2020-05-28 16-08-31

And this result is for hidden_dimensions=[64,256,64], and batch_size=10 and n_epochs=200 Screenshot from 2020-05-28 16-07-33

When I run these experiments. I forgot to track the ELBO objective.

Any suggestions? I think the most concerning thing is the model only predicts/outputs one type of atom. However, I'm not sure how to approach investigating this more. I can start by looking at the type of samples are drawn in step 3. Let me know if some of the steps don't make sense/ are wrong too.

Update 5/29/2020

After talking with Yuanqing and doing some further experiments, we came across some surprising things.

Firstly, as a follow up to the experiments described above for the generative model, we did further experiments with the loss function. We wanted to see why the edge prediction accuracy is so low and why the sampled nodes' types are all Carbon. We experimented with a loss function that does not have the KL term in the ELBO. This loss function only has the two cross-entropy terms associated with nodes and edges prediction. We observe that the results can vary substantially. Sometimes, we would get high accuracy (~90%) for edge prediction (both test and train) and low accuracy (~10%) for node prediction. At other times, we would get roughly 50% accuracy for both edge prediction and node prediction. When we reintroduce the KL term, we of course get the sort of results outlined previously. The edge prediction accuracy is around 50% and the node prediction accuracy is around 75%.

We suspected that there was a bug in our generative model implementation. Therefore, we experimented with a very simple model that is not even auto-encoder but simply a 2 layer neural network. Both layers are linear layers. The first layer's input dimension is 117 (the feature dimension) and its output dimension is 64. The second layer's output dimension is 100 (used for node type classification). We ran this for 200 epochs, using Adam optimizer with step size 0.001. We used cross_entropy for the loss. We observed that the training node prediction accuracy reaches a maximum of around 75% before decreasing to around 60% as the learning algorithm converges. When we print the actual node types being generated, we at least see that the generative model produces some diversity in node types (not all are 6 / Carbon).

These surprising results imply that the results for node accuracy of the generative model might be expected given the choice of the loss function.

yuanqing-wang commented 4 years ago

hmmm this looks properly weird, some worse than tossing a coin. I'll dig into the code later tonight or tomorrow. But first, let's brainstorm some sanity check experiment. For example, if you have a small training set you should be able to overfit it.

yuanqing-wang commented 4 years ago

also what does the output graph look like?

from the 0.75 and plateau curve I was suspecting maybe there is some sort of leak. like you said if you always do carbon you can hit 0.75 so did the model do this?

yuanqing-wang commented 4 years ago

another experiment is to turn off the KL term and just have the reconstruction loss. in that case it would really be easy to reproduce the input.

dnguyen1196 commented 4 years ago

Thanks, will try what you suggetsed.

also what does the output graph look like?

I didn't directly draw/plot the predicted graph but it does seem like the model is predicting a mix of both positive (present) and negative (absent) edges.

from the 0.75 and plateau curve I was suspecting maybe there is some sort of leak. like you said if you always do carbon you can hit 0.75 so did the model do this?

Yeah, I did a few experiments where I print out the actual predicted labels, and they're all 6 (Carbon)

choderalab / pinot

Some results from running generative experiments #47

Generative model

Data

Metrics

Hyper-parameters

Some observations

Update 5/29/2020