MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

An open source and flexible pipeline to analysis high-throughput DNBelab C Series single-cell RNA datasets
MIT License
64 stars 24 forks source link

A question about multiple files of one sample #87

Open NJU-Bio-Info opened 1 month ago

NJU-Bio-Info commented 1 month ago

Hi, we have sequenced 5 samples' scRNA-seq in your company and when I got the data, I found that there are multiple directories under each sample and for each sub-directory, there are also two pairs of cDNA reads. I'm a little confused about the results, so how can i use dnbc4tools to perform demultiplex and gene expression profile? image

or can I just run like this:

dnbc4tools rna run --cDNAfastq1 subdirectory1/cDNA/1_1.fq,subdirectory1/cDNA/2_1.fq,subdirectory2/cDNA/1_1.fq,subdirectory2/cDNA/2_1.fq...?

lishuangshuang0616 commented 1 month ago
dnbc4tools rna run \
--cDNAfastq1 8142-1-211221/cDNA-8142-1-211221/DP8400026224BL_L01_6_1.fq.gz,8142-1-211221/cDNA-8142-1-211221/DP8400026225BL_L01_6_1.fq.gz  \
--cDNAfastq2 8142-1-211221/cDNA-8142-1-211221/DP8400026224BL_L01_6_2.fq.gz,8142-1-211221/cDNA-8142-1-211221/DP8400026225BL_L01_6_2.fq.gz  \
--oligofastq1 8142-1-211221/oligo-8142-1-211221/DP8400026442TR_L01_6_1.fq.gz \
--oligofastq2 8142-1-211221/oligo-8142-1-211221/DP8400026442TR_L01_6_2.fq.gz \
--name 8142-1-211221 \
.......
......
.....

like this ,

NJU-Bio-Info commented 1 month ago

You mean I can produce 3 results for each sample and then I should integrate them in following downstream analysis like Seurat?

lishuangshuang0616 commented 1 month ago

What I mean is that we can put the data under each sample together for analysis, and multiple fastq.gz can be put together if they are from the same library. Finally, each sample will get an expression matrix. Downstream, Seurat analysis will merge the matrix of multiple samples or remove batch effects. https://github.com/MGI-tech-bioinformatics/DNBelab_C_Series_HT_scRNA-analysis-software/blob/version2.0/doc/scRNA_para.md image

NJU-Bio-Info commented 1 month ago

Yeah, I understand your .md file so you mean that the 3 sub-directories under each sample are from different experiment? So I can not simply pool them together when I use dnbc4tools and I should analyze them individually and then use maybe Seurat, harmony etc to integrate them together to get the whole results for each sample?

lishuangshuang0616 commented 1 month ago

You can confirm with the service provider whether the multiple directories under each sample are the results of multiple sequencing runs from the same library of the same sample. I'm not sure if you should separate them or merge them. Only multiple sequencing data from the same library of the same sample can be merged.

NJU-Bio-Info commented 1 month ago

OK, I will check it by myself and thanks for your quick reply! If I know that the 3 sub-directories came from the same sequencing library and then I can merge them, right?

lishuangshuang0616 commented 1 month ago

yes

NJU-Bio-Info commented 1 month ago

maybe just by zcat in shell to merge data from different run and same library? Oh, tx you again, the results are very confusing after I downloaded the data.....

TroyLi99 commented 3 weeks ago

maybe just by zcat in shell to merge data from different run and same library? Oh, tx you again, the results are very confusing after I downloaded the data.....

hi! did you try use 'zcat' to merge data? did it work?