EnsemblGSOC / Ensembl-Repeat-Identification

A Deep Learning repository for predicting the location and type of repeat sequence in genome.
4 stars 3 forks source link

Refactor DETR model to pl lightning #43

Closed yangtcai closed 2 years ago

yangtcai commented 2 years ago

26

yangtcai commented 2 years ago

Maybe this time it can be trained :D

yangtcai commented 2 years ago

Ooops!!! Forget to update the remove.

yangtcai commented 2 years ago

Hi, @williamstark01, I have found what's happening in implementing Lightning, it seems like https://github.com/Ensembl/gene_pcp/blob/121be9895d414da3f13b5c8ec7588754e03336e1/transformer_pipeline.py#L468 will lead to the problem, and I test the callback function when in my friend's cluster, the same code will stop at 14 epochs compared our 5 or 6 epochs. I'm not sure the callback fuction is necessary for our project, what do you think about it?

williamstark01 commented 2 years ago

Early stopping is actually very simple, it just tracks validation loss and stops training if loss didn't improve for a number of epochs. https://pytorch-lightning.readthedocs.io/en/stable/common/early_stopping.html#earlystopping-callback

So I'd say let's add it back, and I see that the number of epochs without improvement before stopping training (called "patience") is set to a low value in config, can you try increasing it and see whether that resolves the issue?

patience: 3
williamstark01 commented 2 years ago

The last commit with the new mAP is large and difficult to see what changed. Could you add a short description on what was updated and why?

yangtcai commented 2 years ago

Thanks for telling me about the Early stopping, I will add this as soon as possible. The commit about the mAP is mainly because I need an identifier of the sequence. When we compute the mAP, we will choose the prediction which has the greatest IOU with the ground truth. When we do the matching, there is a situation we need to avoid that the prediction in the sample1 matches the ground truth in sample2. So I use seq_start as the identifier, in dataloader stage, It will produce the sequence itself and its seq_start, in the training stage, the prediction value also contains the seq_start and the original prediction info (the class and the position).

williamstark01 commented 2 years ago

I noticed another warning:

UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

batch_size is equal to 2 in the config, so you may need to use the extra argument to self.log, or maybe something is wrong there?