rdf1993 commented 11 months ago

------ Tue Aug 29 01:24:56 2023 ------## * 9th: scGet_PCC function start! ******

done!

Get Random Correct background pvalue for each single cell! | | 0%Error in sample.int(length(x), size, replace, prob) : 'replace = FALSE'，因此不能取比总体要大的样本

此外: There were 49 warnings (use warnings() to see them) Timing stopped at: 1.307e+04 1.06e+04 1.114e+04

dengchunyu commented 11 months ago

你好，你这个问题是在运行Get_CorrectBg_p这个函数的时候报的错，我们之前没有遇到过不过，通过错误提醒，我怀疑选择的n_topgenes可能太多了，多于输入的单细胞的基因数量，实际上只要n_topgenes>=100就可以。评估一下单细胞基因的数量是否很少？ Hello, the error you encountered seems to be occurring while running the Get_CorrectBg_p function, and we haven't encountered it before. However, based on the error message, I suspect that you may have selected a value for n_topgenes that is too high, exceeding the number of genes in your input single-cell data. In reality, you only need n_topgenes >= 100. Could you please assess whether the number of genes in your single-cell data is indeed limited?

如果担心浪费时间测试错误，可以尝试以下步骤：

1.将主函数的iters_singlecell这个参数设置为0，这样就会跳过Get_CorrectBg_p这个步骤，得到不包含Random_Correct_BG_p这个结果的文件和输出数据，如下例子：

If you're concerned about wasting time testing the error, you can try the following steps: Set the iters_singlecell parameter in the main function to 0. This will skip the Get_CorrectBg_p step and provide you with files and output data that do not include the Random_Correct_BG_p result. As follows:

Pagwas<-scPagwas_main(Pagwas = NULL, gwas_data =system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas"), Single_data =system.file("extdata", "scRNAexample.rds", package = "scPagwas"), output.prefix="test", Pathway_list=Genes_by_pathway_kegg, iters_singlecell = 0 )

2.接下来，测试Get_CorrectBg_p报错的原因. Next, to troubleshoot the issue with Get_CorrectBg_p, follow these steps: 确保n_topgenes的选择数目不能比单细胞数据的整体基因都要多，n_topgenes < rownames(Pagwas) Ensure that the number selected for n_topgenes is not greater than the total number of genes in your single-cell data, i.e., n_topgenes < rownames(Pagwas).

`n_topgenes = 100 iters_singlecell = 100

Get top PCC gene

scPagwas_topgenes <- rownames(Pagwas@misc$PCC)[order(Pagwas@misc$PCC, decreasing = T)[1:n_topgenes]] correct_pdf<-Get_CorrectBg_p(Single_data=Pagwas, # output of the scPagwas_main function, which is in 'seruat' format data. scPagwas.TRS.Score=Pagwas$scPagwas.TRS.Score1, iters_singlecell=iters_singlecell, #You can choose a smaller parameter when testing n_topgenes=n_topgenes, scPagwas_topgenes=scPagwas_topgenes )`

3.如果以上仍然不能解决问题，希望你能继续提供错误信息，我将继续寻找原因，

不过还可以如此计算，因为计算背景校正的pvalue实际上就是一种背景校正的富集分析方法，重点是得到scPagwas_topgenes，你可以直接用目前领域内任何一种可以计算基因集合富集得分和pvalue的方法，比如我们和scDRS 计算pvalue的原理类似，直接基于scPagwas_topgenes和单细胞数据进行计算： 1)将单细胞数据转换成h5ad格式；2)输出文件scPagwas_topgenes的基因和相关性得分PCC作为权重（或者不加权重也可以，我在计算时发现影响不大）；3)python环境计算scDRS富集得分

If the issue persists despite the above steps, please continue providing error information, and I will continue investigating. However, you can also perform the calculation differently, as calculating the background-corrected p-value is essentially a form of enrichment analysis. The focus is on obtaining scPagwas_topgenes. You can directly calculate the enrichment score and p-value using any method currently used in the field for gene set enrichment analysis. For instance, similar to how we calculate p-values in scDRS, you can perform the following steps: 1)Convert the single-cell data into h5ad format. 2)Use the genes from the result scPagwas_topgenes along with their associated PCC scores as weights (although, in our calculations, I've found that the weighting doesn't have a significant impact for PCC genes). 3)Calculate the enrichment scores using a Python environment for scDRS enrichment analysis.

rdf1993 commented 11 months ago

谢谢。我将iters_singlecell这个参数设置为0后出现了新的报错，提示为“##------ Fri Sep 1 00:34:52 2023 ------## *** 9th: scGet_PCC function start! **** done!

Get Random Correct background pvalue for each single cell! Error in x[[i, drop = TRUE]]: ! ‘scPagwas.upTRS.Score3’ not found in this Seurat object Did you mean "scPagwas.upTRS.Score2"? Run rlang::last_trace() to see where the error occurred. There were 49 warnings (use warnings() to see them) Timing stopped at: 1.247e+04 1.093e+04 1.143e+04” 我检查了我的单细胞seurat对象，RNA assay的data layer是一个36601*31647的矩阵，应该不存在单细胞基因基因特别少的情况

dengchunyu commented 11 months ago

第一步，更新scPagwas，一定要更新！估计更新后就没有这个错误了。第二步，根据https://dengchunyu.github.io/routineuse/2023/05/30/Conventional-Parameters-and-Usage-Instructions-with-Demo-Example-Data.html这个vignette一步一步运行计算过，应该就能知道错误原因了，你这里报错的地方是非常靠后的，前面最重要的部分都跑完了，已经得到了PCC基因，基本上就是计算完成，只不过在一个很小的地方报错了，一步一步计算好处就是不用重新跑前面耗时的部分。

rdf1993 commented 10 months ago

谢谢回复。反馈一下，逐步运行之后我排查到了Get_CorrectBg_p报错的原因，是因为我的seurat对象是一个多组学对象，defaultassay并没有设置在RNA assay，所以在到了这一步之后基因方面没有办法match，可以在代码中强制设一下defaultassay到主函数输入的参数

dengchunyu commented 10 months ago

感谢反馈，我将立刻修改这个bug

rdf1993 @.***> 于2023年9月2日周六 20:40写道：

谢谢回复。反馈一下，逐步运行之后我排查到了Get_CorrectBg_p报错的原因，是因为我的seurat对象是一个多组学对象，defaultassay并没有设置在RNA assay，所以在到了这一步之后基因方面没有办法match，可以在代码中强制设一下defaultassay到主函数输入的参数

— Reply to this email directly, view it on GitHub https://github.com/dengchunyu/scPagwas_reproduce/issues/2#issuecomment-1703822561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILWCUB5HQTCBCB56CCBAI3XYMSLJANCNFSM6AAAAAA4BZN7SE . You are receiving this because you commented.Message ID: @.***>

dengchunyu / scPagwas_reproduce

scGet_PCC error #2

------ Tue Aug 29 01:24:56 2023 ------## * 9th: scGet_PCC function start! ******

Get top PCC gene

dengchunyu / scPagwas_reproduce

scGet_PCC error #2

------ Tue Aug 29 01:24:56 2023 ------## *** 9th: scGet_PCC function start! ****

Get top PCC gene

------ Tue Aug 29 01:24:56 2023 ------## * 9th: scGet_PCC function start! ******