FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
462 stars 150 forks source link

Testing best practices #23

Closed ionox0 closed 6 years ago

ionox0 commented 6 years ago

Thank you for your work on this project,

I was wondering if you had any suggestions for testing TrimGalore in a pipeline. For example, with a very small pair of fastqs, say only a few reads, are there any parameters or suggestions that would make a TrimGalore run only take a few seconds? Thank you

FelixKrueger commented 6 years ago

Hi there, I just tested the command:

time trim_galore illumina_10K.fastq.gz

on 10,000 sequences and it completed in 8.017 seconds. You could probably down-sample the file (which is supplied in the folder test_files) even further, or take out sleep statements which are meant to allow you to quickly review what is actually going to be run. Is this acceptably quick? Cheers, Felix

ionox0 commented 6 years ago

Yes thank you, the reason is just that we are using this as part of a larger pipeline, so any change that decreases running time will add up across all changes and testing done, and reduce overall wait times. It would be nice to have a test flag that would turn off these wait times, but either way I appreciate the help and feedback.

FelixKrueger commented 6 years ago

Hi Ian,

I just applied some time-related reporting optimisation (a351581ab39cae817619d86903a584efc9b972b0) :smiley_cat: which means that you should see a speed-up of up 70% ( for very small datasets like the 10,000 sequence file mentioned above ...). Just get the latest development version for this. All the best, Felix

ionox0 commented 6 years ago

Thank you for that change @FelixKrueger I am working with a past version of your tool (v0.2.5) but I will let you know when this gets updated, it is certainly acceptably fast 👍 . I also would like to mention this error on test datasets when there are zero reads trimmed:

Illegal division by zero at /opt/common/CentOS_6/trim_galore/Trim_Galore_v0.2.5/trim_galore line 523.

Perhaps adding a small epsilon term to the bottom of the fraction would help. Although, if you haven't run into this issue then I may wonder if I am somehow supplying the wrong adapter sequences (my fastq is large enough that I would expect at least a few trimmed reads ~30,000 reads). Is it common to not have trimming for this count?

FelixKrueger commented 6 years ago

You should definitely upgrade to the latest version, as you are nearly 6 years and 20 releases behind! There is also a very high chance that the latest error you are seeing has been fixed in the meantime....