DeepRank / deeprank2

An open-source deep learning framework for data mining of protein-protein interfaces or single-residue variants.
https://deeprank2.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
36 stars 10 forks source link

Test grids (CNN) pipeline with old deeprank paper's data - crystallography experiment #376

Closed gcroci2 closed 1 year ago

gcroci2 commented 1 year ago

We should test the entire pipeline with the old deeprank paper's data, in particular reproducing the crystallographic PPIs's experiment.

Tentative tasks:

ntxxt commented 1 year ago

Hello, I am trying to re-train with deeprankcore with interface data and I have a question. It seems like when taking a list of hdf5 files as input, GridDataset obj can only read the last hdf5 path in the list (thus len(GridDataset) is always 31). Does it mean all the training files have to be combined in a big hdf5 file?

gcroci2 commented 1 year ago

Hello, I am trying to re-train with deeprankcore with interface data and I have a question. It seems like when taking a list of hdf5 files as input, GridDataset obj can only read the last hdf5 path in the list (thus len(GridDataset) is always 31). Does it mean all the training files have to be combined in a big hdf5 file?

Shouldn't be the case, see https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L349

Are you sure that you're giving as input a list of hdf5 files path strings? Like hdf5_path = ['f1.hdf5', 'f2.hdf5', 'f3.hdf5']?

ntxxt commented 1 year ago

yes, I am sure.

gcroci2 commented 1 year ago

Can you send us in Slack some of the hdf5 files you're using and the script as well? @DaniBodor and I will look into this

ntxxt commented 1 year ago

Thank you. I will send you the script and three hdf5 files on slack.

ntxxt commented 1 year ago

After reviewing the DeeprankDataset class, it appears that the _create_index_entries(self) function is not appending the index_entries list properly across multiple HDF5 files. I have fixed it in my local installation. But you might want to check as well. https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L128

gcroci2 commented 1 year ago

After reviewing the DeeprankDataset class, it appears that the _create_index_entries(self) function is not appending the index_entries list properly across multiple HDF5 files. I have fixed it in my local installation. But you might want to check as well.

https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L128

Indeed we spotted the same and it will be fixed in PR #397

gcroci2 commented 1 year ago

Could you try to rerun the code in the branch in https://github.com/DeepRank/deeprank-core/pull/397 and let me know? It should work now @ntxxt

ntxxt commented 1 year ago

yes it works fine now @gcroci2

ntxxt commented 1 year ago

Crystallography experiment is finished, performance as follows: @DaniBodor @gcroci2

measure deeprankcore deeprank
tp: 61 66
tn: 61 72
fp: 20 9
fn: 19 14
Accuray: 0.757 0.857

Train and test dataset are the same Grid settings and hyper parameters are as follows,

Grid parameters:

  1. augmentation_count: 30 (same as in deeprank)
  2. grid_settings = GridSettings([30, 30, 30], [30.0, 30.0, 30.0]), (equivalent to
    grid_info = { 'number_of_points': [30,30,30], 'resolution': [1.,1.,1.]} in deeprank)

Hyper parameters:

  1. Num of features: 20 pssm in deeprankcore (compared to 40 in deeprank, pssm separated by chainA and B)
  2. Model structure is the same: except for the num of input features at layer 1 (as above)
  3. batch size: 32, learning rate: 0.001, epoch 30
DaniBodor commented 1 year ago

This means that you could run the experiment without any bugs or issues coming up?! That is great news! Maybe we can discuss the specific results and why it underperforms compared to deeprank in the next group meeting

gcroci2 commented 1 year ago

We discussed at today's weekly meeting that @ntxxt will report the comparison of the experiment before augmentation vs after augmentation, for training, validation, and testing sets. Also the timings will be reported.

ntxxt commented 1 year ago

More detail on the data: @gcroci2 Training set: 4592 in total; (80% of MANY) validation set: 1146 in total ; (20% of MANY) Test set: 161 in total (full DC set)

Train and valid loss: Trained for 24 epoch, 17 hours on one A100 GPU, 40 CPU. Model saved at epoch 3. image

Train/valid/test confusion matrix and accuracy:

Train measure augmented Not augmented
tp 67457 2209
tn 69758 2282
fp 2472 48
fn 2617 51
Accuracy 0.964 0.978
Valid measure augmented Not augmented
tp 16091 518
tn 16896 543
fp 1084 37
fn 1455 48
Accuracy 0.929 0.926
Test measure augmented Not augmented
tp 1907 61
tn 1958 61
fp 553 20
fn 573 19
Accuracy 0.774 0.757
gcroci2 commented 1 year ago

Thanks :) @ntxxt Do the 17 hours refer to the augmented or non-augmented data experiment? To me, it seems that it's overfitting from almost the beginning. Is the loss curve similar to one of the old experiment? @LilySnow @sonjageorgievska

ntxxt commented 1 year ago

It refers to the augmented one. @gcroci2

ntxxt commented 1 year ago

After running the experiment, the result is as follows, Sadly the test accuracy is still low, but there is a bit more increase in accuracy after augmentation. (0.702 -> 0.760 (for the augmented test set); 0.702 -> 0.727 (for the not augmented test set).

Measure Train Valid Test Augmented_Train Augmented_Valid Augmented_Test Augmented_Test(exclude augmentation)
tp 2223 556 61 67973 16987 1787 55
tn 2287 570 52 71297 17841 2008 62
fp 43 10 29 993 139 503 19
fn 39 10 19 2101 559 593 25
Accuracy 0.982 0.983 0.702 0.978 0.980 0.760 0.727
gcroci2 commented 1 year ago

Summary of experiments and comparisons

I got confused with the grid settings earlier, should be GridSettings([10, 10, 10], [30.0, 30.0, 30.0]) instead, so I re-ran everything...

General settings

Should be the same in both deeprank and deeprankcore

Dataset: pdb_ids used for train/valid/train exactly the same as in deeprank (except 4bm1_1_n.pdb fail to generate surface) Network: "two 3D CNN/Max pooling blocks followed by two linear fully connected layers. This results in 117,362 learnable parameters." In deeprankcore, because there is only 20 pssm feature, result in 104,562 parameters instead Augmentation_count: 30 Grid settings: GridSettings([10, 10, 10], [30.0, 30.0, 30.0]) should be equal to 10 × 10 × 10 Å grids with a resolution of 3 Å Features: 20 pssm in deeprankcore (compared to 40 in deeprank, pssm separated by chainA and B) Batch size: 8 Training: SGD optimizer, learning rate 0.0005, momentum to 0.9, weight decay to 0.001

NA = Not Augmented A = Augmented

Training and training set

We could have a table for validation as well but I think the most relevant info are about the training and the test sets. Measure deeprankcore_NA deeprank_NA deeprankcore_A deeprank_A
Data 4591 4592 142321 142352
tp 2290 unknow 70151 66618
tn 2256 unknow 70652 67121
fp 39 unknow 493 4055
fn 6 unknow 675 4558
Accuracy 0.990 unknow 0.991 0.939
Settings/Resources A100 GPU, 40 CPU, 20 epoch unknow A100 GPU, 40 CPU, 20 epoch 2 gpu
Best epoch 7 unknow 6 2
Training time 5 min unknow 76min unknow

loss NA loss:

image

A loss:

image

deeprank_loss

image

Test set

Measure deeprankcore_NA deeprank_NA deeprankcore_A deeprank_A
Data 161 161 161 161
tp 68 unknow 57/59/38/60 66
tn 53 unknow 69/72/80/73 72
fp 28 unknow 12/9/1/8 9
fn 12 unknow 23/21/42/20 14
Accuracy 0.751 unknow 0.782/0.814/0.733/0.826 0.857

0.782: Augmented model on not augmented test set 0.814: Augmented model on augmented test set, take the majority vote 0.733: Augmented model on augmented test set, take the max 0.826: Augmented model on augmented test set, take the average

gcroci2 commented 1 year ago

In the weekly meeting we decided that the experiment will be redone using 2 epochs as in the old deeprank paper, and placing a layer for turning the deeprank-core pssm 20 channels into 40 channels @ntxxt

ntxxt commented 1 year ago

More detail on the training: The model was trained on the augmented training set, tested on non-augmented test set, making it comparable to the result of DeepRank paper.

In deeprank, pssm features were separated by two chains, 20 pssm features each, resulting in 40 features. In deeprank-core, pssm features for both chains were combined together, resulting in 20 features. As a result, features are mapped differently to the grid and also, the number of trainable parameters decreased at every layer. Both could cause a decrease in model performance.

To up-project feature size from 20 to 40, one embedding layer was added in forward function: self.embedding_layer = nn.Linear(num_features, 40) this ensures the shape of the input feature stays the same as in deeprank.

Added one embedding layer to the original model, resulting in 118202 parameters, trained on the augmented training set for 20 epochs, to be on the same page as deeprank, using the model saved at epoch 2

measure training_set test_set
tp: 54800 79
tn: 54700 69
fp: 16444 12
fn: 16376 1
Accuray: 0.770 0.918

Loss:

image
DaniBodor commented 1 year ago

Added one embedding layer to the original model, resulting in 118202 parameters, trained on the augmented training set for 20 epochs, to be on the same page as deeprank, using the model saved at epoch 2 measure training_set test_set tp: 54800 79 tn: 54700 69 fp: 16444 12 fn: 16376 1 Accuray: 0.770 0.918

Loss: image

So do I see correctly that bassically that did the trick and it's now outperforming deeprank? Would we expect it to outperform though? Wouldn't we expect very similar performance? Also, is it normal for the test accuracy to be so much higher than the train accuracy?

ntxxt commented 1 year ago

Now the model has more trainable parameters than deeprank. Input was also re-shaped before and after the linear embedding layer to match the shape. I suspect it will influence the performance. But in my opinion, when the network changes, I don't think it is comparable anyway... It is indeed very strange testing accuracy is so much higher than training accuracy. But it is testing accuracy calculated using model saved at epoch 2 where training loss is not so low yet.

LilySnow commented 1 year ago

@ntxxt Nice work. Glad to know the embedding layer works. Can you modify your message above to add more details? For example, is the result on augmented or not? Why you need to add the embedding layer (i.e., deep rank-core merged two chains into one, while deeprank has two grids for two chains)? How big is your embedding layer?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

DaniBodor commented 1 year ago

Can this issue be closed? @LilySnow , @ntxxt , @gcroci2

gcroci2 commented 1 year ago

Can this issue be closed? @LilySnow , @ntxxt , @gcroci2

I guess so. Also, I tested deeprankcore with grids and CNNs using the pMHCI 100k dataset, and no bugs were found so far (see the issue #152 in 3D-Vac repo). So I'd say that the CNN pipeline works as expected.