gcroci2 commented 1 year ago

We should test the entire pipeline with the old deeprank paper's data, in particular reproducing the crystallographic PPIs's experiment.

Tentative tasks:

Determine and list roughly old hyperparameters used during training, such as data augmentation, network, grid settings, etc
Train the CNN used in the paper and obtain results at least comparable with the ones published in the paper
Open debugging issues

ntxxt commented 1 year ago

Hello, I am trying to re-train with deeprankcore with interface data and I have a question. It seems like when taking a list of hdf5 files as input, GridDataset obj can only read the last hdf5 path in the list (thus len(GridDataset) is always 31). Does it mean all the training files have to be combined in a big hdf5 file?

gcroci2 commented 1 year ago

Hello, I am trying to re-train with deeprankcore with interface data and I have a question. It seems like when taking a list of hdf5 files as input, GridDataset obj can only read the last hdf5 path in the list (thus len(GridDataset) is always 31). Does it mean all the training files have to be combined in a big hdf5 file?

Shouldn't be the case, see https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L349

Are you sure that you're giving as input a list of hdf5 files path strings? Like hdf5_path = ['f1.hdf5', 'f2.hdf5', 'f3.hdf5']?

ntxxt commented 1 year ago

yes, I am sure.

gcroci2 commented 1 year ago

Can you send us in Slack some of the hdf5 files you're using and the script as well? @DaniBodor and I will look into this

ntxxt commented 1 year ago

Thank you. I will send you the script and three hdf5 files on slack.

ntxxt commented 1 year ago

After reviewing the DeeprankDataset class, it appears that the _create_index_entries(self) function is not appending the index_entries list properly across multiple HDF5 files. I have fixed it in my local installation. But you might want to check as well. https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L128

gcroci2 commented 1 year ago

After reviewing the DeeprankDataset class, it appears that the _create_index_entries(self) function is not appending the index_entries list properly across multiple HDF5 files. I have fixed it in my local installation. But you might want to check as well.

https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L128

Indeed we spotted the same and it will be fixed in PR #397

gcroci2 commented 1 year ago

Could you try to rerun the code in the branch in https://github.com/DeepRank/deeprank-core/pull/397 and let me know? It should work now @ntxxt

ntxxt commented 1 year ago

yes it works fine now @gcroci2

ntxxt commented 1 year ago

Crystallography experiment is finished, performance as follows: @DaniBodor @gcroci2

measure	deeprankcore	deeprank
tp:	61	66
tn:	61	72
fp:	20	9
fn:	19	14
Accuray:	0.757	0.857

Train and test dataset are the same Grid settings and hyper parameters are as follows,

Grid parameters:

augmentation_count: 30 (same as in deeprank)
grid_settings = GridSettings([30, 30, 30], [30.0, 30.0, 30.0]), (equivalent to
grid_info = { 'number_of_points': [30,30,30], 'resolution': [1.,1.,1.]} in deeprank)

Hyper parameters:

Num of features: 20 pssm in deeprankcore (compared to 40 in deeprank, pssm separated by chainA and B)
Model structure is the same: except for the num of input features at layer 1 (as above)
batch size: 32, learning rate: 0.001, epoch 30

DaniBodor commented 1 year ago

This means that you could run the experiment without any bugs or issues coming up?! That is great news! Maybe we can discuss the specific results and why it underperforms compared to deeprank in the next group meeting

gcroci2 commented 1 year ago

We discussed at today's weekly meeting that @ntxxt will report the comparison of the experiment before augmentation vs after augmentation, for training, validation, and testing sets. Also the timings will be reported.

ntxxt commented 1 year ago

More detail on the data: @gcroci2 Training set: 4592 in total; (80% of MANY) validation set: 1146 in total ; (20% of MANY) Test set: 161 in total (full DC set)

Train and valid loss: Trained for 24 epoch, 17 hours on one A100 GPU, 40 CPU. Model saved at epoch 3.

Train/valid/test confusion matrix and accuracy:

Train	measure	augmented
tp	67457	2209
tn	69758	2282
fp	2472	48
fn	2617	51
Accuracy	0.964	0.978

Valid	measure	augmented
tp	16091	518
tn	16896	543
fp	1084	37
fn	1455	48
Accuracy	0.929	0.926

Test	measure	augmented
tp	1907	61
tn	1958	61
fp	553	20
fn	573	19
Accuracy	0.774	0.757

gcroci2 commented 1 year ago

Thanks :) @ntxxt Do the 17 hours refer to the augmented or non-augmented data experiment? To me, it seems that it's overfitting from almost the beginning. Is the loss curve similar to one of the old experiment? @LilySnow @sonjageorgievska

ntxxt commented 1 year ago

It refers to the augmented one. @gcroci2

ntxxt commented 1 year ago

After running the experiment, the result is as follows, Sadly the test accuracy is still low, but there is a bit more increase in accuracy after augmentation. (0.702 -> 0.760 (for the augmented test set); 0.702 -> 0.727 (for the not augmented test set).

Measure	Train	Valid	Test	Augmented_Train	Augmented_Valid	Augmented_Test	Augmented_Test(exclude augmentation)
tp	2223	556	61	67973	16987	1787	55
tn	2287	570	52	71297	17841	2008	62
fp	43	10	29	993	139	503	19
fn	39	10	19	2101	559	593	25
Accuracy	0.982	0.983	0.702	0.978	0.980	0.760	0.727

gcroci2 commented 1 year ago

Summary of experiments and comparisons

I got confused with the grid settings earlier, should be GridSettings([10, 10, 10], [30.0, 30.0, 30.0]) instead, so I re-ran everything...

General settings

Should be the same in both deeprank and deeprankcore

Dataset: pdb_ids used for train/valid/train exactly the same as in deeprank (except 4bm1_1_n.pdb fail to generate surface) Network: "two 3D CNN/Max pooling blocks followed by two linear fully connected layers. This results in 117,362 learnable parameters." In deeprankcore, because there is only 20 pssm feature, result in 104,562 parameters instead Augmentation_count: 30 Grid settings: GridSettings([10, 10, 10], [30.0, 30.0, 30.0]) should be equal to 10 × 10 × 10 Å grids with a resolution of 3 Å Features: 20 pssm in deeprankcore (compared to 40 in deeprank, pssm separated by chainA and B) Batch size: 8 Training: SGD optimizer, learning rate 0.0005, momentum to 0.9, weight decay to 0.001

NA = Not Augmented A = Augmented

Training and training set

We could have a table for validation as well but I think the most relevant info are about the training and the test sets.	Measure	deeprankcore_NA	deeprank_NA	deeprankcore_A
Data	4591	4592	142321	142352
tp	2290	unknow	70151	66618
tn	2256	unknow	70652	67121
fp	39	unknow	493	4055
fn	6	unknow	675	4558
Accuracy	0.990	unknow	0.991	0.939
Settings/Resources	A100 GPU, 40 CPU, 20 epoch	unknow	A100 GPU, 40 CPU, 20 epoch	2 gpu
Best epoch	7	unknow	6	2
Training time	5 min	unknow	76min	unknow

loss NA loss:

A loss:

deeprank_loss

Test set

Measure	deeprankcore_NA	deeprank_NA	deeprankcore_A	deeprank_A
Data	161	161	161	161
tp	68	unknow	57/59/38/60	66
tn	53	unknow	69/72/80/73	72
fp	28	unknow	12/9/1/8	9
fn	12	unknow	23/21/42/20	14
Accuracy	0.751	unknow	0.782/0.814/0.733/0.826	0.857

0.782: Augmented model on not augmented test set 0.814: Augmented model on augmented test set, take the majority vote 0.733: Augmented model on augmented test set, take the max 0.826: Augmented model on augmented test set, take the average

gcroci2 commented 1 year ago

In the weekly meeting we decided that the experiment will be redone using 2 epochs as in the old deeprank paper, and placing a layer for turning the deeprank-core pssm 20 channels into 40 channels @ntxxt

ntxxt commented 1 year ago

More detail on the training: The model was trained on the augmented training set, tested on non-augmented test set, making it comparable to the result of DeepRank paper.

In deeprank, pssm features were separated by two chains, 20 pssm features each, resulting in 40 features. In deeprank-core, pssm features for both chains were combined together, resulting in 20 features. As a result, features are mapped differently to the grid and also, the number of trainable parameters decreased at every layer. Both could cause a decrease in model performance.

To up-project feature size from 20 to 40, one embedding layer was added in forward function: self.embedding_layer = nn.Linear(num_features, 40) this ensures the shape of the input feature stays the same as in deeprank.

Added one embedding layer to the original model, resulting in 118202 parameters, trained on the augmented training set for 20 epochs, to be on the same page as deeprank, using the model saved at epoch 2

measure	training_set	test_set
tp:	54800	79
tn:	54700	69
fp:	16444	12
fn:	16376	1
Accuray:	0.770	0.918

Loss:

DaniBodor commented 1 year ago

Added one embedding layer to the original model, resulting in 118202 parameters, trained on the augmented training set for 20 epochs, to be on the same page as deeprank, using the model saved at epoch 2 measure training_set test_set tp: 54800 79 tn: 54700 69 fp: 16444 12 fn: 16376 1 Accuray: 0.770 0.918

Loss:

So do I see correctly that bassically that did the trick and it's now outperforming deeprank? Would we expect it to outperform though? Wouldn't we expect very similar performance? Also, is it normal for the test accuracy to be so much higher than the train accuracy?

ntxxt commented 1 year ago

Now the model has more trainable parameters than deeprank. Input was also re-shaped before and after the linear embedding layer to match the shape. I suspect it will influence the performance. But in my opinion, when the network changes, I don't think it is comparable anyway... It is indeed very strange testing accuracy is so much higher than training accuracy. But it is testing accuracy calculated using model saved at epoch 2 where training loss is not so low yet.

LilySnow commented 1 year ago

@ntxxt Nice work. Glad to know the embedding layer works. Can you modify your message above to add more details? For example, is the result on augmented or not? Why you need to add the embedding layer (i.e., deep rank-core merged two chains into one, while deeprank has two grids for two chains)? How big is your embedding layer?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

DaniBodor commented 1 year ago

Can this issue be closed? @LilySnow , @ntxxt , @gcroci2

gcroci2 commented 1 year ago

Can this issue be closed? @LilySnow , @ntxxt , @gcroci2

I guess so. Also, I tested deeprankcore with grids and CNNs using the pMHCI 100k dataset, and no bugs were found so far (see the issue #152 in 3D-Vac repo). So I'd say that the CNN pipeline works as expected.

DeepRank / deeprank2

Test grids (CNN) pipeline with old deeprank paper's data - crystallography experiment #376

Summary of experiments and comparisons

General settings

Should be the same in both deeprank and deeprankcore

Training and training set

Test set