dieterich-lab / rp-bp

Rp-Bp is a Bayesian approach to predict, at base-pair resolution, ribosome occupancy and translation.
MIT License
7 stars 5 forks source link

Data munging for the app does not handle de novo ORFs as it should #154

Closed eboileau closed 1 year ago

eboileau commented 1 year ago

Description

While preparing a large compendium of human ORFs with a de novo assembly, we noted that summarize_rpbp_predictions.py does not handle the information correctly.

eboileau commented 1 year ago

Actually, part of the problem stems from the way rpbp handles a de novo annotation, by merging/concatenating the files. In each annotated and de novo files, we shouldn't find any duplicate entries, but it is possible that e.g. transcript ids are the same between both files, in spite the structure. We do find ~85 of these. We also need to revise the numbering of ORFs in this case.

eboileau commented 1 year ago