FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
469 stars 151 forks source link

Multi-Core support #39

Closed fjames003 closed 5 years ago

fjames003 commented 6 years ago

I have added options to trim-galore that take advantage of cutadapts multicore capabilities.

FelixKrueger commented 6 years ago

Hi Frankie,

This is to already say thanks for taking a look at this and submitting a pull request. I was running a few tests today (which I didn't quite finish), and am going to comment in more detail tomorrow.

FelixKrueger commented 6 years ago

Right, here are some more details. Again many thanks for giving this a whirl. So far I have refrained from adding the multi-core option mainly because I thought that it might break the paired-end option completely and therefore require a complete rewrite of Trim Galore.

From my tests it appears that giving Cutadapt several cores with your new option -j does not disrupt the order the reads were in in the original FastQ file. This read-for-read order is an essential requirement for Trim Galore since it initially processes both R1 and R2 in single-end mode, and then performs a validation step on both trimmed files that handles the length threshold.

From all that I saw the new option -j seems to work correctly for both single-end and paired-end files, however I was a little underwhelmed with the actual difference it made. For this test I used example 75bp paired-end files, and trimmed them with either 1, 2, 4, 8, 12 or 16 cores on our cluster:

multicore_trimming

The attached graph shows that using 2 cores for parallel trimming decreased the processing time by ~30% for single-end, and ~20% for paired-end trimming. Anything beyond 2 cores does not make any difference time-wise, but eats up additional resources. I suppose a bottle neck here could be either that Cutadapt is trying to keep the reads in the same order in which they were fed in (which takes time), or that the additional checks (e.g. 5' or 3' trimming) and the writing out within Trim Galore limit the speed at which reads are trimmed. The Cutadapt documentation also mentions that one need to have pigz (parallel gzip) installed to benefit from parallel trimming (which is not running our our cluster).

Apart from the limitations described in the Cutadapt documentation, there is the additional issue with the multi-core option that the parallel trimming mode requires Python 3.3 or later. Since were running Cutadapt with Python 2.7 the whole Trim Galore process died (not very gracefully) because Cutadapt died internally. If we wanted to implement this option we would want a check in Trim Galore that would test first whether Python 2 or 3 was being used, and would terminate multi-core runs if the used Cutadapt or Python versions wouldn't support it.

Given into how much trouble one can run with Python2 and Python3 installations and only one PYTHONPATH for both, I really don't want to open myself up to resolving Python2/3 incompatibility issues - in addition to the fact using several cores doesn't seem all that worthwhile, at least in the current setting.

What would be useful in the long run would be to change the paired-end processing mode completely to using the Cutadapt option of handling paired-end reads together (which didn't exist when we wrote Trim Galore back in 2011 or so...). This would avoid the sequential SE trimming and validation step, and would therefore certainly decrease PE processing time substantially - probably way more than the multi-core option. This has been discussed previously (e.g. here), but so far I haven't found the time to look at it further.

Going forward with this pull request I would say that we should at least add checks for the version of Python (2/3.3+) and Cutadapt so that Trim Galore can fail more gracefully if someone attempts to use it. In addition the option should probably limited to 2 cores (as more than 2 doesn't appear to make sense), and add some help and warning text about this option. What are your thoughts about it?

Best, Felix

fjames003 commented 5 years ago

Hello @FelixKrueger,

I have updated the Trim-Galore code, to check for the version of python found in PATH, as this should be the python that cutadapt is using as well. I ran this in two separate conda environments one with python 2.7.12 and the other with python 3.6.3. In the case of python 2.x.x, the number of cores will be 1, where as with python 3.x.x, it will be what ever is supplied with -j or --cores, with the default being 1. I also implemented a check to see if pigz is installed on the machine. If it is, pigz will be used for all (de)compression with the number of cores supplied with -j or --cores. If pigz is not installed, gzip will be used instead.

After running some basic tests, it is clear that when pigz is installed and cutadapt is being run with python 3.x.x, that the speed is significant and will increase as the number of cores increases, it will eventually get bottlenecked writing to the disk (with a HDD anyway), but this depends on your drive. One thing to note: the percentage cpu is on the order of double what the cores are set to and this is a result of when the validated reads are being written to files, read 1 and read 2 are being written at the same time, this was the case in your version as well as you can see yours got up to 197% when writing. This could be fixed to do one after the other or do both with half the cores, although it isn't a priority for me as its very brief that this happens. (Also note I only ran each test once, so they are ROUGH estimates)


Previous Version: trim_galore.bak --illumina --phred33 --paired 1041.65s user 23.29s system 197% cpu 8:58.65 total 1 core: ./trim_galore --illumina --phred33 --paired 1131.33s user 29.28s system 199% cpu 9:41.36 total ./trim_galore -j 1 --illumina --phred33 --paired 1292.92s user 27.44s system 189% cpu 11:35.80 total 2 cores: ./trim_galore -j 2 --illumina --phred33 --paired 1151.00s user 25.60s system 394% cpu 4:58.07 total 4 cores: ./trim_galore -j 4 --illumina --phred33 --paired 1148.12s user 24.58s system 783% cpu 2:29.74 total 8 cores: ./trim_galore -j 8 --illumina --phred33 --paired 1349.19s user 32.62s system 1225% cpu 1:52.79 total

FelixKrueger commented 5 years ago

Hi @fjames003,

Thanks for your work on this yesterday. I have cloned your version and ran a few tests with it. Using pigz one can indeed achieve a nice level of speed-up, so it is probably really a worthwhile exercise.

As it stands, your version is not working on our side because both tests for the version of Python and pigz are not doing what they need to be doing (e.g.: on our cluster, the command python will always run Python 2, and python3 will run Python 3).

I will nevertheless merge your version, and attempt to solve the remaining issues this afternoon. Thanks again! Felix

FelixKrueger commented 5 years ago

Hi Frankie,

I have tried to get the multi-core support so that it also works on our our cluster, and also for single-end files. I still need to do some testing but overall I am quite happy with its performance.

Would you mind cloning the latest dev version and letting me know if it also works in your hands?

Thanks, Felix

fjames003 commented 5 years ago

Hi @FelixKrueger,

I like your idea of grabbing the shebang from the top of the cutadapt executable and using that to get the version! I will go ahead and clone the dev version and give it a test on our machine and let you know how it goes.

Thanks Frankie

radlinsky commented 5 years ago

@FelixKrueger @fjames003 what was the verdict? (scoots to edge of my seat)

FelixKrueger commented 5 years ago

Hmm let me see. It works? It is fast(er)?

If you want it to also deal well with Cutadapt versions from 2011, I would suggest you get the current dev version (v0.6.1_dev).

fjames003 commented 5 years ago

@FelixKrueger, don't you love having to handle people who are using 8 year old versions of cutadapt?

@radlinsky, The verdict was that the multicore support increased the performance quite a bit. If you are running on a machine with multiple cores or a cluster, I would definitely take advantage of this. Just make sure you have pigz installed otherwise gzip will be a huge bottleneck, and I would update to the latest cutadapt as they have made some substantial improvements over the years.

FelixKrueger commented 5 years ago

absolutely :P