EnsemblGSOC / Ensembl-Repeat-Identification

A Deep Learning repository for predicting the location and type of repeat sequence in genome.
4 stars 3 forks source link

add mAP metric #37

Closed yangtcai closed 2 years ago

yangtcai commented 2 years ago

25

yangtcai commented 2 years ago

It‘s a great suggestion, I will rewrite the training stage and this stage using PyTorch Lightning ;P Link with: #26

yangtcai commented 2 years ago

Related to #34 #35 #36 I have added more Fasta information into our training, I do a quick analysis, and this time we have about 2.5k repeat sequences that can be trained. Also, I need to apologize for the answer of #31, we can add more types to our train because we just select the chromosome 1 to 22 and chrX, chrY, so other types may stay in chromosomes we do not cover.

I think saving logs and experiment files in configuration maybe is a good idea, it can help us to review what's changed our results. :D An old saying, 好记性不如烂笔头(hǎo jì xìng bù rú làn bǐ tóu). I haven't tested it on the cluster, but it can run on my laptop. ;P

yangtcai commented 2 years ago

Is 2.5k the number of segments with repeats or the number of repeats in the included chromosomes?

Oops, this is a typo, the new datasets should be 25k 😂

image
williamstark01 commented 2 years ago

Alright, the numbers make sense now!