ML-Bioinfo-CEITEC / genomic_benchmarks

Benchmarks for classification of genomic sequences
Apache License 2.0
114 stars 14 forks source link

Enhancers_cohn labels incorrect? #37

Open exnx opened 1 year ago

exnx commented 1 year ago

Hi, I was wondering, is there a chance the negative and positive examples are actually switched around by accident?

I trained a model and got a little above what was reported in the paper. However, when I applied the model to sequences I had that were non enhancers (and negatives), I got opposite prediction, pretty much exactly as the same performance during training, but flipped.

For example, getting 70% during training on the GenomicsBenchmark dataset. And then taking that model, and predicting on my own enhancer sequences, I got 70% if I actually switch the labels. Conversely, I get 30% when I used the labels as provided in the GenomicsBenchmark datset.

Any thoughts? Thank you.