cancerit / PCAP-core

NGS reference implementations and helper code for mapping (originally part of ICGC-TCGA-PanCancer)
GNU General Public License v2.0
9 stars 9 forks source link

Support for Illumina's FASTQ file format #44

Closed ckandoth closed 4 years ago

ckandoth commented 5 years ago

Hello! Thanks so much for this tool. Before we do a pull-request, I wanted to ask if you are open to merging in code that adds support for Illumina's FASTQ file format described here. If you would prefer to implement this, that's fine too. But we will be happy to do a PR if you confirm that you can merge it in. We'd prefer not to fork.

keiranmraine commented 5 years ago

Hi,

Can you clarify, are you specifically meaning the file naming convention, or do you mean the cassava style record headers?

@SIM:1:FCX:1:15:6329:1045:GATTACT+GTCTTAAC 1:N:0:ATCCGA

We're certainly willing to accept PRs. If you could add very small example file (10 records) to t/data along with tests confirming they behave as expected that would help speed acceptance. IIRC bwa can interpret the elements and add it to the resulting SAM reads (but it may need a flag).

ckandoth commented 5 years ago

Thanks! Yes, the file naming convention. Currently, there is a requirement here that the paired FASTQs end with _1 and _2. However, all our FASTQs follow the Illumina naming scheme that looks like this:

SampleName_S1_L001_R1_001.fastq.gz
SampleName_S1_L001_R2_001.fastq.gz

# For example:
T2-N_IGO_07008_BU_3_S19_R1_001.fastq.gz
T2-N_IGO_07008_BU_3_S19_R2_001.fastq.gz
ckandoth commented 4 years ago

A PR is now at https://github.com/cancerit/PCAP-core/pull/56. I'll close this issue.