HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

Fix for insert size distributions #247

Closed HadrienG closed 5 months ago

HadrienG commented 7 months ago

I identified one issue with the insert size distribution: the template length was capped at 500, du to bowtie2 only reporting concordant pairs with TLEN < 500 (-X option in bowtie2)

This PR changes the model building process to iterate over all pairs instead of all concordant pairs, and adds some filters to the template length distribution to remove outliers.

The pre-built models will need to be rebuilt with these changes