MikkelSchubert / adapterremoval

AdapterRemoval v2 - rapid adapter trimming, identification, and read merging
http://adapterremoval.readthedocs.io/
GNU General Public License v3.0
102 stars 23 forks source link

Trimming reads on 3' and 5' ends #19

Closed apeltzer closed 4 years ago

apeltzer commented 7 years ago

Hi Mikkel!

do you think it would be possible to add an option to trim [n] bases off after clipping adapters off the reads? I mean in case of DNA damage, lots of people are doing this manually (removing a single base of the read ends for example, or clip off internal barcodes).

It shouldn't be too difficult to add this or what do you think?

MikkelSchubert commented 7 years ago

Hi Alexander, I agree, it shouldn't be too hard to add that functionality, though there are some potential complications with merged reads.

How does something like this look:

--trim5p N --trim3p N

and if you want to specify different N for mate 1 and mate 2:

--trim5p N1,N2 --trim3p N1,N2

I would suggest the following order of operations:

  1. Reads are aligned
  2. Adapters are removed and reads are merged
  3. New: Fixed number of bases are trimmed from either end
  4. Reads are trimmed by quality and for Ns

In the case of merged reads, the new 5' would be trimmed using the mate 1 --trim5p value, and the new 3' would be trimmed using the mate 2 --trim5p value. I think that collapsed reads trimmed in this manner should still be put in the .collapsed output file, and not .collapsed.truncated, since you can still trust both coordinates as long as you use the same setting for all files from a given library.

Any thoughts?

Cheers

apeltzer commented 7 years ago

Hi Mikkel! I agree on all points and think that this is the best way to resolve that - can't think of a situation where this should e.g. fail....

MikkelSchubert commented 7 years ago

Hi Alexander, Sorry for taking so long, but I just started on a new job this month and my mind has been elsewhere.

I've added support for trimming fixed numbers of bases to the master branch. It works as discussed previously, except that I used the same format for specifying multiple inputs as with --file1 and --file2 (so two values separated by spaces instead of commas). I want to do some more work before pushing the next release, but hope to get it out soon.

Let me know you find any issues.

Cheers

apeltzer commented 7 years ago

Hi Mikkel! Will test this soon and get back to yuo - I was on vacation for a couple of days so couldn't proceed either. Hope your new job works out well for you! All the best!