biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

Parallelise "filter reads" step in Makefile #18

Closed lweasel closed 9 years ago

lweasel commented 9 years ago

At the moment, the filter reads step calls a Bash script, "filter_reads", which then calls the "filter_sample_reads" python script for each sample, to do species separation on the mapped reads for that sample.

One way of changing this might be to:

s-heron commented 9 years ago

The block splitting has been implemented. On a test sample (1A1) it ran in 13m30.560s to split it into 4 blocks for both species. Several minutes can be shaved off this through termination of the sambamba file stream after each start id has been extracted, but I haven't found a workable way to do this. Parallelised execution of filtering on the blocks will be implemented after the new filter script has been written.

s-heron commented 9 years ago

Wrote & commited the parallelisation control script; filter_control.py