calico / baskerville

Machine learning methods for DNA sequence analysis.
Apache License 2.0
32 stars 9 forks source link

Indel #8

Closed davek44 closed 1 year ago

davek44 commented 1 year ago

Description of your changes

Compute all scores (importantly, D2) within the shift loop. Otherwise predictions are averaged before computing scores, which can screw up the bin alignment. Also introduced compensation shifts to better balance indels.

GTEx indel eQTL classification with Borzoi ensemble rises from 0.695 to 0.735 AUROC.