HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

Amplicon performance update and features #246

Closed ThijsMaas closed 5 months ago

ThijsMaas commented 7 months ago

Hi @HadrienG,

This PR brings an updated multiprocessing method that should ideally work for both large input genomes and many small input amplicons. For this, I had to rework the main app.py generate_reads function and split it up for better readability.

The new multiprocessing approach is to calculate the number of reads each CPU needs to generate to get to the requested total, and then pre-assign tuples of SeqIO records and the number of reads to a 'work list'. This then gets distributed to the multiprocessing pool.

ThijsMaas commented 6 months ago

@HadrienG With the last merge commit https://github.com/HadrienG/InSilicoSeq/pull/246/commits/e6db5da766cf8bfc13e80744f0bf45bb760f7aeb I have added the custom fragment length argument, as we have discussed before. Everything to review should be in that commit.