Closed gcroci2 closed 1 year ago
Hello, I am trying to re-train with deeprankcore with interface data and I have a question. It seems like when taking a list of hdf5 files as input, GridDataset obj can only read the last hdf5 path in the list (thus len(GridDataset) is always 31). Does it mean all the training files have to be combined in a big hdf5 file?
Hello, I am trying to re-train with deeprankcore with interface data and I have a question. It seems like when taking a list of hdf5 files as input, GridDataset obj can only read the last hdf5 path in the list (thus len(GridDataset) is always 31). Does it mean all the training files have to be combined in a big hdf5 file?
Shouldn't be the case, see https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L349
Are you sure that you're giving as input a list of hdf5 files path strings? Like hdf5_path = ['f1.hdf5', 'f2.hdf5', 'f3.hdf5']
?
yes, I am sure.
Can you send us in Slack some of the hdf5 files you're using and the script as well? @DaniBodor and I will look into this
Thank you. I will send you the script and three hdf5 files on slack.
After reviewing the DeeprankDataset class, it appears that the _create_index_entries(self) function is not appending the index_entries list properly across multiple HDF5 files. I have fixed it in my local installation. But you might want to check as well. https://github.com/DeepRank/deeprank-core/blob/7a824711f849f6f06711d9d0974a8fa8bbb2783d/deeprankcore/dataset.py#L128
After reviewing the DeeprankDataset class, it appears that the _create_index_entries(self) function is not appending the index_entries list properly across multiple HDF5 files. I have fixed it in my local installation. But you might want to check as well.
Indeed we spotted the same and it will be fixed in PR #397
Could you try to rerun the code in the branch in https://github.com/DeepRank/deeprank-core/pull/397 and let me know? It should work now @ntxxt
yes it works fine now @gcroci2
Crystallography experiment is finished, performance as follows: @DaniBodor @gcroci2
measure | deeprankcore | deeprank |
---|---|---|
tp: | 61 | 66 |
tn: | 61 | 72 |
fp: | 20 | 9 |
fn: | 19 | 14 |
Accuray: | 0.757 | 0.857 |
Train and test dataset are the same Grid settings and hyper parameters are as follows,
Grid parameters:
Hyper parameters:
This means that you could run the experiment without any bugs or issues coming up?! That is great news! Maybe we can discuss the specific results and why it underperforms compared to deeprank in the next group meeting
We discussed at today's weekly meeting that @ntxxt will report the comparison of the experiment before augmentation vs after augmentation, for training, validation, and testing sets. Also the timings will be reported.
More detail on the data: @gcroci2 Training set: 4592 in total; (80% of MANY) validation set: 1146 in total ; (20% of MANY) Test set: 161 in total (full DC set)
Train and valid loss: Trained for 24 epoch, 17 hours on one A100 GPU, 40 CPU. Model saved at epoch 3.
Train/valid/test confusion matrix and accuracy:
Train | measure | augmented | Not augmented |
---|---|---|---|
tp | 67457 | 2209 | |
tn | 69758 | 2282 | |
fp | 2472 | 48 | |
fn | 2617 | 51 | |
Accuracy | 0.964 | 0.978 |
Valid | measure | augmented | Not augmented |
---|---|---|---|
tp | 16091 | 518 | |
tn | 16896 | 543 | |
fp | 1084 | 37 | |
fn | 1455 | 48 | |
Accuracy | 0.929 | 0.926 |
Test | measure | augmented | Not augmented |
---|---|---|---|
tp | 1907 | 61 | |
tn | 1958 | 61 | |
fp | 553 | 20 | |
fn | 573 | 19 | |
Accuracy | 0.774 | 0.757 |
Thanks :) @ntxxt Do the 17 hours refer to the augmented or non-augmented data experiment? To me, it seems that it's overfitting from almost the beginning. Is the loss curve similar to one of the old experiment? @LilySnow @sonjageorgievska
It refers to the augmented one. @gcroci2
After running the experiment, the result is as follows, Sadly the test accuracy is still low, but there is a bit more increase in accuracy after augmentation. (0.702 -> 0.760 (for the augmented test set); 0.702 -> 0.727 (for the not augmented test set).
Measure | Train | Valid | Test | Augmented_Train | Augmented_Valid | Augmented_Test | Augmented_Test(exclude augmentation) |
---|---|---|---|---|---|---|---|
tp | 2223 | 556 | 61 | 67973 | 16987 | 1787 | 55 |
tn | 2287 | 570 | 52 | 71297 | 17841 | 2008 | 62 |
fp | 43 | 10 | 29 | 993 | 139 | 503 | 19 |
fn | 39 | 10 | 19 | 2101 | 559 | 593 | 25 |
Accuracy | 0.982 | 0.983 | 0.702 | 0.978 | 0.980 | 0.760 | 0.727 |
I got confused with the grid settings earlier, should be GridSettings([10, 10, 10], [30.0, 30.0, 30.0]) instead, so I re-ran everything...
Dataset: pdb_ids used for train/valid/train exactly the same as in deeprank (except 4bm1_1_n.pdb fail to generate surface) Network: "two 3D CNN/Max pooling blocks followed by two linear fully connected layers. This results in 117,362 learnable parameters." In deeprankcore, because there is only 20 pssm feature, result in 104,562 parameters instead Augmentation_count: 30 Grid settings: GridSettings([10, 10, 10], [30.0, 30.0, 30.0]) should be equal to 10 × 10 × 10 Å grids with a resolution of 3 Å Features: 20 pssm in deeprankcore (compared to 40 in deeprank, pssm separated by chainA and B) Batch size: 8 Training: SGD optimizer, learning rate 0.0005, momentum to 0.9, weight decay to 0.001
NA = Not Augmented A = Augmented
We could have a table for validation as well but I think the most relevant info are about the training and the test sets. | Measure | deeprankcore_NA | deeprank_NA | deeprankcore_A | deeprank_A |
---|---|---|---|---|---|
Data | 4591 | 4592 | 142321 | 142352 | |
tp | 2290 | unknow | 70151 | 66618 | |
tn | 2256 | unknow | 70652 | 67121 | |
fp | 39 | unknow | 493 | 4055 | |
fn | 6 | unknow | 675 | 4558 | |
Accuracy | 0.990 | unknow | 0.991 | 0.939 | |
Settings/Resources | A100 GPU, 40 CPU, 20 epoch | unknow | A100 GPU, 40 CPU, 20 epoch | 2 gpu | |
Best epoch | 7 | unknow | 6 | 2 | |
Training time | 5 min | unknow | 76min | unknow |
loss NA loss:
A loss:
deeprank_loss
Measure | deeprankcore_NA | deeprank_NA | deeprankcore_A | deeprank_A |
---|---|---|---|---|
Data | 161 | 161 | 161 | 161 |
tp | 68 | unknow | 57/59/38/60 | 66 |
tn | 53 | unknow | 69/72/80/73 | 72 |
fp | 28 | unknow | 12/9/1/8 | 9 |
fn | 12 | unknow | 23/21/42/20 | 14 |
Accuracy | 0.751 | unknow | 0.782/0.814/0.733/0.826 | 0.857 |
0.782: Augmented model on not augmented test set 0.814: Augmented model on augmented test set, take the majority vote 0.733: Augmented model on augmented test set, take the max 0.826: Augmented model on augmented test set, take the average
In the weekly meeting we decided that the experiment will be redone using 2 epochs as in the old deeprank paper, and placing a layer for turning the deeprank-core pssm 20 channels into 40 channels @ntxxt
More detail on the training: The model was trained on the augmented training set, tested on non-augmented test set, making it comparable to the result of DeepRank paper.
In deeprank, pssm features were separated by two chains, 20 pssm features each, resulting in 40 features. In deeprank-core, pssm features for both chains were combined together, resulting in 20 features. As a result, features are mapped differently to the grid and also, the number of trainable parameters decreased at every layer. Both could cause a decrease in model performance.
To up-project feature size from 20 to 40, one embedding layer was added in forward function: self.embedding_layer = nn.Linear(num_features, 40) this ensures the shape of the input feature stays the same as in deeprank.
Added one embedding layer to the original model, resulting in 118202 parameters, trained on the augmented training set for 20 epochs, to be on the same page as deeprank, using the model saved at epoch 2
measure | training_set | test_set |
---|---|---|
tp: | 54800 | 79 |
tn: | 54700 | 69 |
fp: | 16444 | 12 |
fn: | 16376 | 1 |
Accuray: | 0.770 | 0.918 |
Loss:
Added one embedding layer to the original model, resulting in 118202 parameters, trained on the augmented training set for 20 epochs, to be on the same page as deeprank, using the model saved at epoch 2 measure training_set test_set tp: 54800 79 tn: 54700 69 fp: 16444 12 fn: 16376 1 Accuray: 0.770 0.918
Loss:
So do I see correctly that bassically that did the trick and it's now outperforming deeprank? Would we expect it to outperform though? Wouldn't we expect very similar performance? Also, is it normal for the test accuracy to be so much higher than the train accuracy?
Now the model has more trainable parameters than deeprank. Input was also re-shaped before and after the linear embedding layer to match the shape. I suspect it will influence the performance. But in my opinion, when the network changes, I don't think it is comparable anyway... It is indeed very strange testing accuracy is so much higher than training accuracy. But it is testing accuracy calculated using model saved at epoch 2 where training loss is not so low yet.
@ntxxt Nice work. Glad to know the embedding layer works. Can you modify your message above to add more details? For example, is the result on augmented or not? Why you need to add the embedding layer (i.e., deep rank-core merged two chains into one, while deeprank has two grids for two chains)? How big is your embedding layer?
This issue is stale because it has been open for 30 days with no activity.
Can this issue be closed? @LilySnow , @ntxxt , @gcroci2
We should test the entire pipeline with the old deeprank paper's data, in particular reproducing the crystallographic PPIs's experiment.
Tentative tasks: