cancerit / telomerecat

Telomerecat: The telomere computational analysis tool
GNU General Public License v3.0
20 stars 5 forks source link

Feature/clean code #24

Closed keiranmraine closed 3 years ago

keiranmraine commented 3 years ago

Parabam is not stable under python 3 due to:

https://github.com/cancerit/telomerecat/issues/23#issuecomment-788925317

Migrating telbam generation to use pysam.collate() via a wrapper is stable and faster with a small memory trade-off. We believe this will solve to following issues: #15, #16, #23

The remaining functions for using parabam appear to still be reliable.

Fixed bad implementation of setting seed for --seed_randomness. This makes the main processing faster too as there are far fewer tests to see of the seed needs to be set, however any "stable" data for comparison will need to be regenerated. Relates to:

https://github.com/cancerit/telomerecat/pull/17#issuecomment-728154237

Now additionally allows CRAM input for bam2telbam and bam2length (#21), outputs (and labels in files) are still BAM.

Results following changes have been validated by Tim Butler and Daniel Leongamornlert (Sanger, CASM)

(tagging @jhrf for awareness)