YiPeng-Gao / scDaPars

Dynamic Analysis of Alternative Polyadenylation from single-cell RNA-seq (scDaPars)
10 stars 1 forks source link

problem #9

Closed ghost closed 3 years ago

ghost commented 3 years ago
  1. The bam file is separated into the bam of different cells. and there are about 700,000 files. The barcode sequence recorded by barcodes.tsv.gz is 9403. Need to refer to barcodes.tsv.gz information?
  2. DaPars produces the result file of each chromosome, and finally I use cat to merge the result files of all chromosomes to form a merged file, right?
  3. What are the thresholds of scDaPars?
ghost commented 3 years ago

Thanks for your help before I wish you all the best in the future. Best regards

YiPeng-Gao commented 3 years ago
  1. You need to refer to the cell barcodes provided in your data, in your case, there should be 9403 bam files for 9403 single cells.
  2. Yes, you need to merge all chromosomes together.
  3. what do you mean by thresholds?
ghost commented 3 years ago
  1. If there are multiple samples, multiple barcode.csv and Bam files are generated, at which step of the analysis step should the data of multiple samples be merged?After cellranger count generates bam files, will the bam files of these different samples be merged?
ghost commented 3 years ago

Use 400 cell files for subsequent DaPars2 analysis, Run DaPars2, the result file run scDaPars,show error :

image

YiPeng-Gao commented 3 years ago

There is nothing I can do for you if you just throw questions like this. The code itself should run smoothly (you can test using the example files). If you are experiencing errors, you can run each step separately using R scripts I provided on GitHub to debug your analysis and figure out what is the problem. If the question you identified is indeed related to my algorithm, I will help you with it. Otherwise, I cannot help you.

ghost commented 3 years ago

npc = which.max(var_cum > var_thre) What is the range of npc?

YiPeng-Gao commented 3 years ago

It depends on your data. but npc > 0 and npc<# of PCs.

ghost commented 3 years ago

I don’t understand what you mean, npc<# of PCs., what does # stand for? using the example files , npc = which.max(var_cum > var_thre) = 86; Using my data, npc = 1

YiPeng-Gao commented 3 years ago

"#" = number of..... Then the error you met earlier is because of your data. NPC is the number of PCs that can explain at least 40% of the variance in your data. In the example file, the number is 86, which means the first 86 PCs explain 40% of the variance in the data.

ghost commented 3 years ago

Why did var_thre choose 0.4? Can it be adjusted? 1> var_thre>= 0.4, is this value acceptable? If I modify var_thre = 0.5, my data can be analyzed using scDaPars image image