Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

10X split by oligos #105

Closed klevdiamanti closed 4 years ago

klevdiamanti commented 4 years ago

Hi, I have data from 10X ran in 2 lanes and the fastq files are split by oligos (explained herehttps://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/fastq-input). Essentially the path structure is the follows:

|-- sample1_1 |--|-- sample1_S1_L001_I1.fastq.gz |--|-- sample1_S1_L001_R1.fastq.gz |--|-- sample1_S1_L001_R2.fastq.gz
|--|-- sample1_S1_L002_I1.fastq.gz
|--|-- sample1_S1_L002_R1.fastq.gz
|--|-- sample1_S1_L002_R2.fastq.gz
|-- sample1_2
|--|-- sample1_S2_L001_I1.fastq.gz |--|-- sample1_S2_L001_R1.fastq.gz |--|-- sample1_S2_L001_R2.fastq.gz |--|-- sample1_S2_L002_I1.fastq.gz |--|-- sample1_S2_L002_R1.fastq.gz |--|-- sample1_S2_L002_R2.fastq.gz |-- sample1_3 |--|-- sample1_S3_L001_I1.fastq.gz |--|-- sample1_S3_L001_R1.fastq.gz |--|-- sample1_S3_L001_R2.fastq.gz |--|-- sample1_S3_L002_I1.fastq.gz |--|-- sample1_S3_L002_R1.fastq.gz |--|-- sample1_S3_L002_R2.fastq.gz |-- sample1_4 |--|-- sample1_S4_L001_I1.fastq.gz |--|-- sample1_S4_L001_R1.fastq.gz |--|-- sample1_S4_L001_R2.fastq.gz |--|-- sample1_S4_L002_I1.fastq.gz |--|-- sample1_S4_L002_R1.fastq.gz |--|-- sample1_S4_L002_R2.fastq.gz

I am thinking of using prepare.smk to concatenate lanes within each directory and then consider each directory as a different sample. As the end result of the sample in this case I would use the umi table from the results/summary directory.

Does my approach sound ok or does it look questionable?