MarWoes / wg-blimp

wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data
GNU Affero General Public License v3.0
27 stars 12 forks source link

wg-blimp v0.10.0 #22

Closed JakeLehle closed 2 years ago

JakeLehle commented 2 years ago

Hello Dr. Wöste,

This is Jake Lehle, we spoke here on github late last year and I've been using your pipeline with our lab at The University of Texas at San Antonio to analyze WGBS data. I've been very happy with the pipeline and I think it is a great tool that will help many in the field of epigenetic profiling of DNA methylation patterns. It certainly has already helped us with a few projects going on currently in the lab.

However, I saw that at the alignment step of the pipeline you were using bwa-meth which I think is starting to become outdated when compared to some of the newer cutting-edge aligners that are being developed. To improve the speed of the pipeline I rewrote a number of rules in the Snakemake workflow and replaced the bwa-meth alignment step with steps from the gemBS pipeline which uses the GEM3 aligner produced by Simon Heath's group. This modification significantly increased the alignment speed when scaled to larger sequencing datasets while maintaining the overall accuracy of aligned reads when compared to the previous pipeline.

I did benchmarking of the original and modified pipelines and included the results in a manuscript that I recently submitted to BMC Bioinformatics as it was the journal where the original pipeline was published. The pre-pub of the submitted manuscript can be found here: https://www.researchsquare.com/article/rs-1666741/v1

While the manuscript is under review I wanted to submit my modified code as a pull request so we can discuss it more and if you are interested in including my changes as a permanent part of the wg-blimp pipeline.

Best, Jake Lehle

MarWoes commented 2 years ago

Hi Jake,

Thanks so much for the great work, very interesting findings and implementation! :) I recently switched working places, so I'll probably no longer be able to maintain wg-blimp and we are currently discussing options how we want to go on with wg-blimp. This might however take a while, but I'll keep you updated on the process! For the PR: I have not yet had the time to look into all changes in detail, but I think providing gemBS for use in wg-blimp would indeed be a good idea. Maybe we can think about making the pipeline configurable to be able to choose which alignment tool to use? That way we would be able to provide some backwards-compatibilty for use cases where bwa-meth was used for alignment (although the downstream results should be quite similar with either of both tools).

Kind regards, Marius

JakeLehle commented 2 years ago

Hi Dr. Wöste,

Thanks for getting back to me. I absolutely agree with the thought of maintaining compatibility for bwa-meth for those who have already been using the pipeline with previous datasets and don't want to worry about having to re-analyzing old data if the pipeline changes. That's a great idea and I should have thought about including that with my submitted changes. Let me look into updating the cli.py file and adding some click options so we can toggle which aligner is used when running the pipeline. I'll also update my changes to the Snakemake rules to bring back in the bwa-meth aligner option.

As for going forward with the future of wg-blimp, I absolutely understand these things take time especially while changing jobs and taking on new projects, and yes please keep me up to date on any decisions you make. I think it's a great tool and something we need more of in the field to help with easy efficient end-to-end processing of WGBS data.

Talk soon, Jake Lehle

JakeLehle commented 2 years ago

Hi Dr. Wöste,

Okay, I just updated the wg-blimp CLI and added a new click option allowing the user to choose either gemBS or bwa-meth as the aligner used by the pipeline. This maintains compatibility for users to select bwa-meth if they have used it previously to analyze datasets. The new --aligner choice option is used by the Snakemake workflow to run the proper rule set within an if loop statement.

Let me know if you have any questions or edits you would like to discuss. Best, Jake Lehle

MarWoes commented 2 years ago

Hi Jake,

Sorry once again for the very long wait... Very cool work I'd say! :) Next week I have scheduled a time slot to test the PR on my local machine, but from what I can tell that is a very nice addition to the feature set. If nothing breaks during the tests I'd like to merge the PR and request a conda update.

Kind regards, Marius

JakeLehle commented 2 years ago

Hi Dr. Wöste,

Great! I'm excited to see what you think next week.

So just a heads up, I don't know if you are planning to use the blood and sperm sample data sets that you provide with the pipeline. If you are, you will need to trim them. GemBS will throw an error message if you put in paired fastqs that aren't the same length.

Here are the files that you will need to trim. blood1_2 to only having 4219668 lines. blood2_2 to only having 3398016 lines. sperm1_1 to only having 3956272 lines. and sperm2_2 to only having 3489516 lines.

I did this with vim by just opening those files up, moving to that line, and deleting all the extra lines from there to the end of the file.

If you do that then everything should run completely. Alternatively, If you wanna use some larger data sets you can get an idea for the improved speed, and full-sized paired sequencing files should have equal numbers of reads to run and not throw any errors.

Let me know how it goes! Best, Jake Lehle

MarWoes commented 2 years ago

Hi Jake,

Thank you for the instructions on the test dataset, that certainly helped! I ran the pipeline using bwameth and also the new gemBS implementation, very cool work! :) I added some comments for necessary code changes, but those were only minor issues. Actually, only the one requiring the gemBS environment to be added would be necessary to be approached for merging I guess.

So once again sorry for the long wait and thanks so much for your work!

Kind regards, Marius

MarWoes commented 2 years ago

Thanks for your changes, all looking good now I think :) I'll merge this PR and push the changes to Bioconda. Thanks again for contributing to wg-blimp!

JakeLehle commented 2 years ago

Thank you so much. I'm honored to be able to contribute!