dstreett / Super-Deduper

An application to remove PCR duplicates from high throughput sequencing runs.
11 stars 4 forks source link

support for latest illumina fastq formats #14

Closed waalkes closed 9 years ago

waalkes commented 9 years ago

David,

Great tool. Any chance you will be updating it to support the latest illumina formats?

Here is the latest format(which includes spaces and lists the read number differently): http://support.illumina.com/content/dam/illumina-support/help/BaseSpaceHelp_v2/Content/Vault/Informatics/Sequencing_Analysis/BS/swSEQ_mBS_FASTQFiles.htm

Here is an example read name based in their latest format: @M00745:186:000000000-AH7LV:1:1101:14346:1426 1:N:0:15

I have written a sed script to modify into a format that works with superdeduper but it might be nice for superdeduper to handle the format natively.

Adam

dstreett commented 9 years ago

Hey Adam,

Glad you like the application and thank you for the feed back! Just fixed it and pushed it to master.

Let me know if there are any issues.

Regards, David