aryeelab / guideseq

Analysis pipeline for the GUIDE-seq assay.
GNU Affero General Public License v3.0
75 stars 53 forks source link

Threading option for bwa mem step #34

Open biobenkj opened 7 years ago

biobenkj commented 7 years ago

It would be great to add a threading option to the wrapper and the individual steps for the bwa mem alignment step. Happy to set up a pull request.

vedtopkar commented 7 years ago

Hi Ben,

We're always open to integrating improvements! Could you clarify what you mean by threading option in this case?

Best, Ved

biobenkj commented 7 years ago

Something to the effect of leveraging the threading capability of bwa. For instance, I have hard coded the threading option (-t 25) into the source for my own benefit currently in the alignReads.py:

    # Run paired end alignment against the genome
    logger.info('Running paired end mapping for {0}'.format(sample_name))
    bwa_alignment_command = '{0} mem -t 25 {1} {2} {3}'.format(BWA_path,
                                                         HG19_path,
                                                         read1,
                                                         read2)

You could add an option in the wrapper to for bwa threading to speed up alignment on multiple processor machines. Maybe something like:

align_parser = subparsers.add_parser('align', help='Paired end read mapping to genome')
    align_parser.add_argument('--bwa', required=True)
    align_parser.add_argument('--genome', required=True)
    align_parser.add_argument('--read1', required=True)
    align_parser.add_argument('--read2', required=True)
    align_parser.add_argument('--outfolder', required=True)
    align_parser.add_argument('--threads', required=False)

Also, thanks for writing this piece of software. It's quite wonderful. Trying out a rat Guide-seq experiment.

michael-weinstein commented 7 years ago

I was actually just modifying it to do this. Do you want my code if I get it working?

martinaryee commented 7 years ago

@michael-weinstein Yes - That would be great!

michael-weinstein commented 7 years ago

About to give it a test run now. I'll let you know if I get something good.

vedtopkar commented 7 years ago

@biobenkj Thanks for the clarification, didn't realize that bwa had a threading flag!

michael-weinstein commented 7 years ago

If this works, I'll have a version that can take "bwa_threads" in the manifest or as an argument for the align function (with the appropriate input validations). If nothing is supplied, it should default to single-thread.

michael-weinstein commented 7 years ago

So far it looks like it's working when I launch it directly into alignment mode. I have a test going with the full pipeline waiting for cluster resources as I write this. If you go through these two files, you can see my changes by doing a find for bwa_threads

guideseq.py.txt alignReads.py.txt

vedtopkar commented 7 years ago

@michael-weinstein Sounds good! Do you mind submitting a pull request with your changes? I only have a few comments here for changes:

I fixed the spacing here: align_parser.add_argument('--bwa_threads', type=int, default=1)

Also, in the parseManifest method, do you mind moving the bwa_thread parameter extraction to a separate if-else statement outside of that try/except block? (For example, see the lines addressing the optional demultiplex_min_reads parameter. We're trying to keep that try/except block for required manifest fields only.

Let me know if you have any questions! Looking forward to your contribution 😃