Deal with overfitting - Githubissues

yangtcai commented 2 years ago

Hi, @williamstark01, I test two different hyperparameter to deal with overfitting.The orange one is the first time I trained our model, it's learning rate 0.001, transformer encoder-decorder 6 layers. When I find our model is overfitting, I find reletived issue in DETR, small datasets could lead to this problem, so I change the layers from 6 to 3. The blue one's learning rate is 0.001, transformer encoder-decorder 3 layers. And the red one, I change the dropout from 0.2 to 0.1, and also change the learning rate to 0.0001.

The related issue link: https://github.com/facebookresearch/detr/issues/342 I think we can add more chromosome datasets into train our model, as COCO datasets are up to 330k, we only have 8.9k to train. Am I understanding is correct?

williamstark01 commented 2 years ago

Nice experimenting and troubleshooting! It looks to me that the learning rate was the major culprit for overfitting at this point. 1e-4 is a good value, very frequently used (and we might add lr decay as well later on). 3 layers for the transformer encoder and decoder also makes sense for now. The dropout should probably be increased since we observe overfitting, 0.3 up to 0.5 are potentially good values to try.

At this stage it would be good to organize how we track multiple training experiments. TensorBoard is a good option, we just need to also save hyperparameter values that will help us filter experiments. Could you share the one you are currently using?

I think at this point it would be worth looking at converting all tunable variables of the network into hyperparameters, for example number of transformer layers, number of attention heads, etc. Those could either be arguments to the training script or an experiment configuration file. New issue: #35

Good work, very encouraging initial results!

williamstark01 commented 2 years ago

Almost forgot:

I think we can add more chromosome datasets into train our model, as COCO datasets are up to 330k, we only have 8.9k to train. Am I understanding is correct?

This is correct, the training set is relatively small at this point. We should get better results if we add more chromosomes in the dataset. New issue: #36

yangtcai commented 2 years ago

At this stage it would be good to organize how we track multiple training experiments. TensorBoard is a good option, we just need to also save hyperparameter values that will help us filter experiments. Could you share the one you are currently using? Hi, @williamstark01, I'm slightly confused with this part, do you mean the hyperparameters or TenserBoard files? :D

williamstark01 commented 2 years ago

I meant the URL to the TensorBoard dashboard if you are using the official public one ( https://tensorboard.dev/ ) and uploading the log there. Or are you running a TensorBoard locally?

EnsemblGSOC / Ensembl-Repeat-Identification

Deal with overfitting #34