MikkelSchubert / paleomix

Pipelines and tools for the processing of ancient and modern HTS data.
https://paleomix.readthedocs.io/en/stable/
MIT License
43 stars 19 forks source link

Trimming first 6-8 bp of read 1 #18

Open nwales opened 6 years ago

nwales commented 6 years ago

Hi Mikkel, I hope you are doing well!

I'm processing some new data and I wondered if there a way to remove the first N bp of read 1. In this experiment the adapters have an index directly next to the insert, so that the first 6 or 8 bp of read 1 are library specific. Ideally I'd like to check that index matches the expect sequence as an extra precaution on patterned flowcells, but simply trimming the first N bp could work for now. I see AdapterRemoval's demultiplexing option gets close to the idea, but it seems it would not work with the current implementation in Paleomix. If it's not possible or worth including, I could use the pre-trimming option.

Thanks! Nathan P.S. I saw zonkey is listed as published in 2007...

MikkelSchubert commented 5 years ago

Hi Nathan,

Are these SE or PE reads? If they are SE reads, then checking/trimming the first N bases should suffice, but if they are PE reads, then using the demultiplexing feature in AdapterRemoval would probably be the simplest solution, since that would take care of both identifying matching sequences and correctly trimming the barcode from both reads in the pair.

There is currently no way to do what you describe via PALEOMIX, but adding support for demultiplexing is something I intend to do at some point.

And thank you for pointing out the typo!

Best, Mikkel

nwales commented 5 years ago

Hi Mikkel,

Thanks for the suggestions. I'm using SE reads, and indeed I've just been trimming the first N bases in advance. If it's not too hard to implement someday, it could be helpful to make other AdapterRemoval options available within paleomix (e.g. --trim5p).

Cheers, Nathan