LuyiTian / scPipe

a pipeline for single cell RNA-seq data analysis
69 stars 24 forks source link

Error in FUN(...) while doing “create_processed_report” #124

Closed zoe106 closed 4 years ago

zoe106 commented 4 years ago

When I was trign to use create_processed_report function for CEL-Seq data, I met this error. Quitting from lines 153-166 (report.Rmd) Error in FUN(...) : cells should have non-zero library sizes In addition: Warning message: Duplicated aesthetics after name standardisation: hjust

My code is as followed: create_processed_report( outdir = out_dir, organism = "hsapiens_gene_ensembl", gene_id_type = "ensembl_gene_id" )

I checked the count folder and gene_count.csv, both of them are there. Could you please give some advice to fix it?

Also, the plotQC function is not in the newly released scater. Which could cause an error in create_processed_report step.

Shians commented 4 years ago

It sounds like there is a column in your gene_count that is all 0's, causing the non-zero library sizes error. I'll look into updating the plotQC function, thanks for letting us know.

zoe106 commented 4 years ago

hi, Shians I am not sure if I understand correct. It seems that after sce_qc, there need to do another rouond of zero cell detection because sce_qc remove some row with low quality and there maybe one column in the rest row are all 0's. However the Rmd file is missing this step, I think that cause the non-zero library sizes error.

zoe106 commented 4 years ago

Dear Dr.Tian, I have some new findingabout this issue, but the issus has been closed. Could you please reopen it, or give me some advice? I am not sure if I understand correct. It seems that after sce_qc, there need to do another rouond of zero cell detection because sce_qc remove some row with low quality and there maybe one column in the rest row are all 0's. However the Rmd file is missing this step, I think that cause the non-zero library sizes error.

Thank you in advance.

--

张源笙 Yuansheng Zhang Beijing Institute of Genomics Chinese Academy of Sciences No.1 Beichen West Road, Chaoyang District Beijing 100101, China Tel: 86-15036062519

Email: zhang.yuansheng@126.com

在 2019-12-23 11:47:12,"Luyi Tian" notifications@github.com 写道:

Closed #124 via c8bbc82.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Shians commented 4 years ago

Sorry, the issue closing is a part of the automated GitHub workflow, because I fixed one of the problems in this issue but not the other. It's a bit strange for there to be zero library sizes because the report should have filtered those out at https://github.com/LuyiTian/scPipe/blob/master/inst/extdata/report_template_slim.Rmd#L62-L69. It's hard to say what's going on without the actual data, I'm away for the next week but will try and diagnose it again when I get back.

zoe106 commented 4 years ago

hi Shians, I am aware that report_template_slim.Rmd#L62-L69 have filtered the column with all 0's. But in the #L123-L140, the script remove some row that are low abundance genes. After that, some column might become all 0's, and it happened in my data. So, I think there need another round of zero cell detection and remove. The attachment is a human CEL-Seq data PRJNA473536 done follow the http://bioinf.wehi.edu.au/scPipe/star/CelSeq-1G/process-star.R (except changing the reference to hg38). Looking forward to your reply. Thank you in advance.

Yuansheng

在 2019-12-25 20:32:33,"Shian Su" notifications@github.com 写道:

Sorry, the issue closing is a part of the automated GitHub workflow, because I fixed one of the problems in this issue but not the other. It's a bit strange for there to be zero library sizes because the report should have filtered those out at https://github.com/LuyiTian/scPipe/blob/master/inst/extdata/report_template_slim.Rmd#L62-L69. It's hard to say what's going on without the actual data, I'm away for the next week but will try and diagnose it again when I get back.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Shians commented 4 years ago

@zoe106 Thanks for looking into it further. You are right that some cell counts go to zero when gene filtering is done. In the case of the example data we provide, because it's a tiny subset, most genes do not reach the expression threshold.

I'm don't want to just do another round of filtering because in the example data we go from ~16000 genes down to 15 after filtering, I want to signal an issue when such loss of data happens. I will need to think about how to handle this situation.

Kind regards, Shian

zoe106 commented 4 years ago

hi Shian, Great thanks for your reply. I realize there is no need to do another round of filtering because in my data we go from ~2397 genes down to zero after filtering. Perhaps I need to soften the cutoff of keep1 and keep2 a little bit. keep1 <- rowMeans(counts(sce_qc)) > 1 # average count larger than one keep2 <- rowSums(counts(sce_qc) > 0) > 2 # expressed in at least three cells Thanks you again for your time and help. I will continue to look for updates.

Best, Zoe

At 2020-01-08 10:35:29, "Shian Su" notifications@github.com wrote:

@zoe106 Thanks for looking into it further. You are right that some cell counts go to zero when gene filtering is done. In the case of the example data we provide, because it's a tiny subset, most genes do not reach the expression threshold.

I'm don't want to just do another round of filtering because in the example data we go from ~16000 genes down to 15 after filtering, I want to signal an issue when such loss of data happens. I will need to think about how to handle this situation.

Kind regards, Shian

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

LuyiTian commented 4 years ago

keep1 <- rowMeans(counts(sce_qc)) > 1 this is the strong condition. you can replace it with keep1 = (apply(counts(sce), 1, function(x) mean(x[x>0])) > 1).

I would recommond you checking your data. It is strange that no gene left after filtering.