apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
189 stars 17 forks source link

Step skipping doesn't work properly in the classification modules when the output of `annotate` changes #46

Open apcamargo opened 11 months ago

apcamargo commented 11 months ago

When the output of annotate changes (due to a change in the sensitivity of the search, for instance), the marker-classification and nn-classification will skill skip some steps.

In marker-classification this will cause some features to be incompatible with the actual gene annotations (e.g., marker frequency remains the same, when it should have changed). In both marker-classification and nn-classification, the provirus outputs will remain intact, even if no provirus was detected in the second execution (leading to an error in summary, as exemplified below).

Reproducing the bug

Run the end-to-end module twice to classify LC735414.1, first with -s 4.2 and then with -s 1. A provirus will be detected when running with -s 4.2 but not with -s 1, causing a bug in the summary module.