Open biobenkj opened 7 years ago
Hi Ben,
We're always open to integrating improvements! Could you clarify what you mean by threading option in this case?
Best, Ved
Something to the effect of leveraging the threading capability of bwa. For instance, I have hard coded the threading option (-t 25) into the source for my own benefit currently in the alignReads.py:
# Run paired end alignment against the genome
logger.info('Running paired end mapping for {0}'.format(sample_name))
bwa_alignment_command = '{0} mem -t 25 {1} {2} {3}'.format(BWA_path,
HG19_path,
read1,
read2)
You could add an option in the wrapper to for bwa threading to speed up alignment on multiple processor machines. Maybe something like:
align_parser = subparsers.add_parser('align', help='Paired end read mapping to genome')
align_parser.add_argument('--bwa', required=True)
align_parser.add_argument('--genome', required=True)
align_parser.add_argument('--read1', required=True)
align_parser.add_argument('--read2', required=True)
align_parser.add_argument('--outfolder', required=True)
align_parser.add_argument('--threads', required=False)
Also, thanks for writing this piece of software. It's quite wonderful. Trying out a rat Guide-seq experiment.
I was actually just modifying it to do this. Do you want my code if I get it working?
@michael-weinstein Yes - That would be great!
About to give it a test run now. I'll let you know if I get something good.
@biobenkj Thanks for the clarification, didn't realize that bwa had a threading flag!
If this works, I'll have a version that can take "bwa_threads" in the manifest or as an argument for the align function (with the appropriate input validations). If nothing is supplied, it should default to single-thread.
So far it looks like it's working when I launch it directly into alignment mode. I have a test going with the full pipeline waiting for cluster resources as I write this. If you go through these two files, you can see my changes by doing a find for bwa_threads
@michael-weinstein Sounds good! Do you mind submitting a pull request with your changes? I only have a few comments here for changes:
I fixed the spacing here:
align_parser.add_argument('--bwa_threads', type=int, default=1)
Also, in the parseManifest
method, do you mind moving the bwa_thread parameter extraction to a separate if-else statement outside of that try/except block? (For example, see the lines addressing the optional demultiplex_min_reads
parameter. We're trying to keep that try/except block for required manifest fields only.
Let me know if you have any questions! Looking forward to your contribution 😃
It would be great to add a threading option to the wrapper and the individual steps for the bwa mem alignment step. Happy to set up a pull request.