Closed lweasel closed 9 years ago
The block splitting has been implemented. On a test sample (1A1) it ran in 13m30.560s to split it into 4 blocks for both species. Several minutes can be shaved off this through termination of the sambamba file stream after each start id has been extracted, but I haven't found a workable way to do this. Parallelised execution of filtering on the blocks will be implemented after the new filter script has been written.
Wrote & commited the parallelisation control script; filter_control.py
At the moment, the filter reads step calls a Bash script, "filter_reads", which then calls the "filter_sample_reads" python script for each sample, to do species separation on the mapped reads for that sample.
One way of changing this might be to: