dengchunyu / scPagwas_reproduce

reproduce scPagwas code
4 stars 1 forks source link

scGet_PCC error #2

Open rdf1993 opened 11 months ago

rdf1993 commented 11 months ago

------ Tue Aug 29 01:24:56 2023 ------## *** 9th: scGet_PCC function start! ****

done!

此外: There were 49 warnings (use warnings() to see them) Timing stopped at: 1.307e+04 1.06e+04 1.114e+04

dengchunyu commented 11 months ago

你好,你这个问题是在运行Get_CorrectBg_p这个函数的时候报的错,我们之前没有遇到过 不过,通过错误提醒,我怀疑选择的n_topgenes可能太多了,多于输入的单细胞的基因数量,实际上只要n_topgenes>=100就可以。评估一下单细胞基因的数量是否很少? Hello, the error you encountered seems to be occurring while running the Get_CorrectBg_p function, and we haven't encountered it before. However, based on the error message, I suspect that you may have selected a value for n_topgenes that is too high, exceeding the number of genes in your input single-cell data. In reality, you only need n_topgenes >= 100. Could you please assess whether the number of genes in your single-cell data is indeed limited?

如果担心浪费时间测试错误,可以尝试以下步骤:

1.将主函数的iters_singlecell这个参数设置为0,这样就会跳过Get_CorrectBg_p这个步骤,得到不包含Random_Correct_BG_p这个结果的文件和输出数据,如下例子:

If you're concerned about wasting time testing the error, you can try the following steps: Set the iters_singlecell parameter in the main function to 0. This will skip the Get_CorrectBg_p step and provide you with files and output data that do not include the Random_Correct_BG_p result. As follows:

Pagwas<-scPagwas_main(Pagwas = NULL, gwas_data =system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas"), Single_data =system.file("extdata", "scRNAexample.rds", package = "scPagwas"), output.prefix="test", Pathway_list=Genes_by_pathway_kegg, iters_singlecell = 0 )

2.接下来,测试Get_CorrectBg_p报错的原因. Next, to troubleshoot the issue with Get_CorrectBg_p, follow these steps: 确保n_topgenes的选择数目不能比单细胞数据的整体基因都要多,n_topgenes < rownames(Pagwas) Ensure that the number selected for n_topgenes is not greater than the total number of genes in your single-cell data, i.e., n_topgenes < rownames(Pagwas).

`n_topgenes = 100 iters_singlecell = 100

Get top PCC gene

scPagwas_topgenes <- rownames(Pagwas@misc$PCC)[order(Pagwas@misc$PCC, decreasing = T)[1:n_topgenes]] correct_pdf<-Get_CorrectBg_p(Single_data=Pagwas, # output of the scPagwas_main function, which is in 'seruat' format data. scPagwas.TRS.Score=Pagwas$scPagwas.TRS.Score1, iters_singlecell=iters_singlecell, #You can choose a smaller parameter when testing n_topgenes=n_topgenes, scPagwas_topgenes=scPagwas_topgenes )`

3.如果以上仍然不能解决问题,希望你能继续提供错误信息,我将继续寻找原因,

不过还可以如此计算,因为计算背景校正的pvalue实际上就是一种背景校正的富集分析方法,重点是得到scPagwas_topgenes,你可以直接用目前领域内任何一种可以计算基因集合富集得分和pvalue的方法,比如我们和scDRS 计算pvalue的原理类似,直接基于scPagwas_topgenes和单细胞数据进行计算: 1)将单细胞数据转换成h5ad格式;2)输出文件scPagwas_topgenes的基因和相关性得分PCC作为权重(或者不加权重也可以,我在计算时发现影响不大);3)python环境计算scDRS富集得分

If the issue persists despite the above steps, please continue providing error information, and I will continue investigating. However, you can also perform the calculation differently, as calculating the background-corrected p-value is essentially a form of enrichment analysis. The focus is on obtaining scPagwas_topgenes. You can directly calculate the enrichment score and p-value using any method currently used in the field for gene set enrichment analysis. For instance, similar to how we calculate p-values in scDRS, you can perform the following steps: 1)Convert the single-cell data into h5ad format. 2)Use the genes from the result scPagwas_topgenes along with their associated PCC scores as weights (although, in our calculations, I've found that the weighting doesn't have a significant impact for PCC genes). 3)Calculate the enrichment scores using a Python environment for scDRS enrichment analysis.

rdf1993 commented 11 months ago

谢谢。我将iters_singlecell这个参数设置为0后出现了新的报错,提示为“##------ Fri Sep 1 00:34:52 2023 ------## *** 9th: scGet_PCC function start! **** done!

dengchunyu commented 11 months ago

第一步,更新scPagwas,一定要更新!估计更新后就没有这个错误了。 第二步,根据https://dengchunyu.github.io/routineuse/2023/05/30/Conventional-Parameters-and-Usage-Instructions-with-Demo-Example-Data.html这个vignette一步一步运行计算过,应该就能知道错误原因了,你这里报错的地方是非常靠后的,前面最重要的部分都跑完了,已经得到了PCC基因,基本上就是计算完成,只不过在一个很小的地方报错了,一步一步计算好处就是不用重新跑前面耗时的部分。

rdf1993 commented 10 months ago

谢谢回复。反馈一下,逐步运行之后我排查到了Get_CorrectBg_p报错的原因,是因为我的seurat对象是一个多组学对象,defaultassay并没有设置在RNA assay,所以在到了这一步之后基因方面没有办法match,可以在代码中强制设一下defaultassay到主函数输入的参数

dengchunyu commented 10 months ago

感谢反馈,我将立刻修改这个bug

rdf1993 @.***> 于2023年9月2日周六 20:40写道:

谢谢回复。反馈一下,逐步运行之后我排查到了Get_CorrectBg_p报错的原因,是因为我的seurat对象是一个多组学对象,defaultassay并没有设置在RNA assay,所以在到了这一步之后基因方面没有办法match,可以在代码中强制设一下defaultassay到主函数输入的参数

— Reply to this email directly, view it on GitHub https://github.com/dengchunyu/scPagwas_reproduce/issues/2#issuecomment-1703822561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILWCUB5HQTCBCB56CCBAI3XYMSLJANCNFSM6AAAAAA4BZN7SE . You are receiving this because you commented.Message ID: @.***>