epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 3 forks source link

scRnaExpression_CellRanger_Fastq - sample manifest should allow duplicate mapping for shared libraries #185

Closed syansanofi closed 3 years ago

syansanofi commented 3 years ago

Problem
Currently, duplicate library names are dropped silently from library csv files created for each sample in this workflow. This prevents us from using a many-one mapping of GEX-ADT libraries like the example manifest below:

parameterType shortName libtype master Parameter1 Parameter2
fastqFile TestNode01GE GEX TestNode01 /ngs/data/demo/test/fastq_data/TestNode01GE/TestNode01GE_S33_L001_R1_001.fastq.gz /ngs/data/demo/test/fastq_data/TestNode01GE/TestNode01GE_S33_L001_R2_001.fastq.gz
fastqFile TestNode02GE GEX TestNode02 /ngs/data/demo/test/fastq_data/TestNode02GE/TestNode02GE_S34_L001_R1_001.fastq.gz /ngs/data/demo/test/fastq_data/TestNode02GE/TestNode02GE_S34_L001_R2_001.fastq.gz
fastqFile TestNode01ADT Antibody TestNode01 /ngs/data/demo/test/fastq_data/TestNode01ADT/TestNode01ADT_S1_L001_R1_001.fastq.gz /ngs/data/demo/test/fastq_data/TestNode01ADT/TestNode01ADT_S1_L001_R2_001.fastq.gz
fastqFile TestNode01ADT Antibody TestNode02 /ngs/data/demo/test/fastq_data/TestNode01ADT/TestNode01ADT_S1_L001_R1_001.fastq.gz /ngs/data/demo/test/fastq_data/TestNode01ADT/TestNode01ADT_S1_L001_R2_001.fastq.gz

One way to get around this problem is to rename the samples for the shared libraries.
However I HIGHLY suspect this will create error from cellranger end because the prefix from FASTQ files must match sample name. TestNode01ADT = TestNode01ADT_S1_L001_R1_001.fastq.gz

Proposal
Remove enforcement of unique names for each library types.