NationalGenomicsInfrastructure / ngi_pipeline

Code driving the production pipeline at SciLifeLab
6 stars 24 forks source link

parse sample name from fastq file instead of sample directory #405

Closed b97pla closed 2 years ago

b97pla commented 2 years ago

When organizing a flow cell, ngi_pipeline would deduce the sample_name from the name of the directory containing the fastq files (e.g. FC/Unaligned/AB-1234/AB-1234-101). For bcl2fastq, this corresponds to the SampleID field in the samplesheet, whereas the fastq file name prefix (up until the _S1) actually corresponds to the SampleName field in the samplesheet.

This meant that samples having multiple library preps sequenced in the same lane would not be organized together since they would have the same name but different sample ids.

This PR changes so that ngi_pipeline deduces the sample_name from the fastq file prefix and the sample_id from the sample directory. This way, samples having different SampleIds but same SampleName in the samplesheet will be organized together.