bihealth / digestiflow-demux

:spaghetti: Digestiflow Demultiplexing Tool
MIT License
2 stars 4 forks source link

Add support for index pools, e.g. from 10x. #8

Closed messersc closed 5 years ago

messersc commented 5 years ago

Use case

10x runs are indexed with pools instead of just one index per sample. In these cases, web api will return a comma-separated list (actually a string) of indices.

To be able to use bcl2fastq, samples need to be split onto multiple lines, one per sequence in the pool. For this, we need to add another samplesheet writer or extend the current one for bcl2fastq2.

For a webtool to create samplesheets for 10x data, see https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/bcl2fastq-direct

Example of non-spec sheet

[Data]
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,i7_Index_ID,index,Sample_Project,Description
1,MySample,,,,SI-GA-C7,GTCTCTCG,AATCTCTC,CGGAGGGA,TCAGAAAT,Project,

Example of a good sheet

[Data]
Lane,Sample_ID,Sample_Name,index,Sample_Project
1,SI-GA-C7_1,MySample,GTCTCTCG,Chromium_20190201
1,SI-GA-C7_2,MySample,AATCTCTC,Chromium_20190201
1,SI-GA-C7_3,MySample,CGGAGGGA,Chromium_20190201
1,SI-GA-C7_4,MySample,TCAGAAAT,Chromium_20190201
messersc commented 5 years ago

Another example of a good samplesheet

[Data]
lane,sample_id,index,sample_project
1,SP007,AATCTCTC,Project
1,SP007,CGGAGGGA,Project
1,SP007,GTCTCTCG,Project
1,SP007,TCAGAAAT,Project

the sample_ids do not need to be unique for bcl2fastq2 to work, however sequences for all 4 barcodes will end up in the same fastq file this way.