Unexpected distribution of insert sizes from paired-end HiSeq simulation

HadrienG / InSilicoSeq

:rocket: A sequencing simulator

https://insilicoseq.readthedocs.io

MIT License

176 stars 32 forks source link

Unexpected distribution of insert sizes from paired-end HiSeq simulation #186

Closed allind closed 2 weeks ago

allind commented 3 years ago

I am simulating paired-end HiSeq reads from a single genome, and the insert sizes of the simulated reads are very different from what is expected from a HiSeq run. The insert sizes of the resulting reads seem to be evenly distributed between ~300-1200 bp (see here: simulated_insert_metric.pdf). Real sequencing data from HiSeq runs has insert sizes that peak around an expected value. I'm not sure what would be causing this.

jsgounot commented 2 years ago

Do we have an update on this? I know the repo is quite old and maybe no one is managing it anymore, but a random insert size is not representative of a true dataset and can be a big issue for some softwares.

HadrienG commented 2 weeks ago

Fixed in 2.0.0