Clinical-Genomics / cg

Glue between Clinical Genomics apps
8 stars 2 forks source link

Generate samplesheets for raredisease with both lane number and flowcell #3565

Closed ramprasadn closed 3 days ago

ramprasadn commented 1 month ago

Current samplesheets for raredisease look like below. As you can see, the lane field indicates the lane number where the sample was run in the flowcell.

sample,lane,fastq_1,fastq_2,sex,phenotype,paternal_id,maternal_id,case_id
1234N,1,tiny_n_L001_R1_xxx.fastq.gz,tiny_n_L001_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,4,tiny_n_L004_R1_xxx.fastq.gz,tiny_n_L004_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,2,tiny_n_L002_R1_xxx.fastq.gz,tiny_n_L002_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,7,tiny_n_L007_R1_xxx.fastq.gz,tiny_n_L007_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,8,tiny_n_L008_R1_xxx.fastq.gz,tiny_n_L008_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey

We use the lane number to create read groups in the pipeline. It is possible that a sample might be run in the same lane over multiple flowcells, and this might tamper with we generate read groups. So to avoid such scenarios, it would be great if we could report both flowcellID and lane number instead of just the lane number. I am thinking something like this,

sample,lane,fastq_1,fastq_2,sex,phenotype,paternal_id,maternal_id,case_id
1234N,FLOWCELLID_1,tiny_n_L001_R1_xxx.fastq.gz,tiny_n_L001_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,FLOWCELLID_4,tiny_n_L004_R1_xxx.fastq.gz,tiny_n_L004_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,FLOWCELLID_2,tiny_n_L002_R1_xxx.fastq.gz,tiny_n_L002_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,FLOWCELLID_7,tiny_n_L007_R1_xxx.fastq.gz,tiny_n_L007_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
1234N,FLOWCELLID_8,tiny_n_L008_R1_xxx.fastq.gz,tiny_n_L008_R2_xxx.fastq.gz,2,1,caseyupper,caseyupperlamb,caseydonkey
ramprasadn commented 1 month ago

@peterpru I am tagging you here since we discussed this in person đŸ˜›

rannick commented 3 days ago

The lane in the samplesheet does not correspond to the real sequencing lane. It is just an iterator over the fastq files for a sample. The situation where a sample has twice the same lane should never occur.