EnsemblGSOC / Ensembl-Repeat-Identification

A Deep Learning repository for predicting the location and type of repeat sequence in genome.
4 stars 3 forks source link

print sample of predictions #45

Closed williamstark01 closed 2 years ago

williamstark01 commented 2 years ago

Visualizing the predictions of the network will help us understand them better and debug and finetune the model. I'm thinking of printing a small number of predictions on the test set vs the ground truth sequence, something like:

ground truth: TCCCTCCCTCCTTCcattcatgcatgcgttcattcagtcattcattcCTCAGCAGTCGCT
  prediction: TCCCTCCCTccttccattcatgcatgcgttcattcagtcattCATTCCTCAGCAGTCGCT

Or with numbers for the type of repeat:

ground truth: TCCCTCCCTCCTTC444444444444444444444444444444444CTCAGCAGTCGCT
  prediction: TCCCTCCCT444444444444444444444444445555555CATTCCTCAGCAGTCGCT

An example of a similar printout for another project: (and how it's implemented)

sample assignments
assignment | true label
-----------------------
     AGAP2 |      AGAP2
      GSE1 |       GSE1
     PSMD7 |      PSMD7
       DCN |        DCN
    VANGL2 |     VANGL2
      RAE1 |       RAE1
     DCAF7 |      DCAF7
    TBXAS1 |     LPCAT1  !!!
    ERGIC3 |     ERGIC3
       LYN |        LYN
yangtcai commented 2 years ago

Hi, @williamstark01, it can be look up in this line right?https://github.com/Ensembl/gene_symbol_classifier/blob/cdb1975884988dea235f97eab3ebcb75a0262878/models.py#L239

williamstark01 commented 2 years ago

At that line I'm changing the formatting of the logging handlers in order to remove timestamps. It's definitely something you can reuse.