ghm17 / LOGODetect

LOGODetect is a powerful tool to identify small segments that harbor local genetic correlation between two traits/diseases.
GNU General Public License v3.0
19 stars 5 forks source link

sfLapply issue, process kileed #17

Closed Dolce-99 closed 1 year ago

Dolce-99 commented 1 year ago

Hi, thank you so much for providing such great software. However I could run it successfully. I performed LOGODetect on my GWAS summary statistics. But I got an error after sfLibrary(snowfall) initiated:

Error in checkForRemoteErrors(val) : one node produced an error: missing value where TRUE/FALSE needed Calls: sfLapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors Execution halted

The process has been killed. I set breakpoints to check which command has the issue and found probably is sfLapply function.

sfInit(parallel = TRUE, cpus = ncore) sfLibrary(snowfall) if(npop == 1){ sfExport('block', 'n_ref', 'thre', 'n_montecarlo', 'n1', 'n2', 'h2_snp_1', 'h2_snp_2', 'M', 'theta', 'Cn', 'inter', 'out_dir', 'intercept') sfExport('scan', 'svd.try', 'simulate_zscore_helper', 'cal_qmax_1pop') Result = sfLapply(1:nrow(block), apply.fun.1pop) ** couldn't run this line }

I run LOGODetect on a cluster server and only tried on one chromosome. Does anyone has any idea about this issue?

Great thanks in advance!

Dolce-99 commented 1 year ago

I tried the example input file and it can be successfully run on my cluster, so I don't think it's a cluster issue. Here are my two input file, I converted them into the correct input format and I can't see anything wrong. Hope it's helpful. Thanks!

Dolce-99 commented 1 year ago

I think I solved the issue. LOGODetect can only support inputing all chromosomes rather than one or two of them. That's why I got this error.

ghm17 commented 1 year ago

Hi, thank you for using our tool. I looked into your input GWAS summary statistics and found that 3 of 199 SNPs remain after intersecting with reference panel. Our reference panel is of hg19, please make sure that your input GWAS have the matched genome build. Separately, LOGODetect actually supports inputing GWAS summary statistics for only one chromosome. I have added the flag '--chr' to explicitly support the chromosome-specific analysis. Also, I have just fixed a bug in 'cal_qmax_1pop' function. You should be able to run the updated script successfully using your input GWAS summary statistics now.

Dolce-99 commented 1 year ago

Thank you so much for the modification! I only subset a small part of my GWAS summary statistics into phenotype1.txt and phenotype2.txt, just as an example, so I think it's not surprising to have such a low overlapping rate. However, after I downloaded the updated version of LOGODetect.R and added flag '--chr', the process still can't run successfully, and the same issue popped up.

Here's my code:

Rscript LOGODetect.R \ --sumstats phenotype1.txt,phenotype2.txt \ --n_gwas 426831,426831 \ --ref_dir ../LOGODetect_data/LOGODetect_1kg_ref \ --pop EUR \ --chr 22 \ --ldsc_dir ./ldsc \ --block_partition ./block_partition.txt \ --out_dir ../results/test \ --n_cores 20

And the error is the same as before:

Error in checkForRemoteErrors(val) : one node produced an error: missing value where TRUE/FALSE needed Calls: sfLapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors Execution halted

Still need help and really appreciate your kind and patience!

ghm17 commented 1 year ago

Could you provide me the file named 'ldsc_rg.log' under directory '/out_dir/tmp_files/ldsc'. I think ldsc cannot provide heritability estimates if such few SNPs are provided, which causes NA input in our algorithm.

Dolce-99 commented 1 year ago

Sure, here are the logs, I also attached the log for the whole process named as chr22_all.log

Much appreciated!

ghm17 commented 1 year ago

I have altered the sentence to read LDSC output correctly. Please have a try with the updated version.

Dolce-99 commented 1 year ago

Thank you so much for the work. But unfortunately, it failed again... Stopped at the same point with the same error 😢

Dolce-99 commented 1 year ago

chr22_all_1026.log ldsc_rg_1026.log

Attached new logs in case they might be helpful.

ghm17 commented 1 year ago

Could you also provide the gwas summary statistics for chr22?

Dolce-99 commented 1 year ago

Can I have your email so that I can send them to you? Thanks!

ghm17 commented 1 year ago

hmguo@stanford.edu. Thanks!

ghm17 commented 1 year ago

Well, heritability estimates by LDSC are 1.89 and 0.87 for two traits (restricted to chr22 SNPs), which are unrealistically too large (even larger than the 1 which is theoretical upper bound, but could happen in numerical computation) for a particular small chromosome. This makes sqrt(1-h2) as NA and the error pops out. I assume that the summary statistics are generated by simulations, right? If so, I suggest specifying a much smaller heritability value in the simulation setting, and perform analysis using genome-wide data if possible which makes LDSC estimates more stable.

Dolce-99 commented 1 year ago

Well, it isn't generated from simulation but real GWAS results. And I thought two h2 estimates are 0.2149 and 0.214, according to the log. Did I misunderstand something?

ghm17 commented 1 year ago

Sorry that I have made a mistake in the input sample size of two GWASs After correction I can get the same h2 estimates as yours. The reason why the error pops out is that in the z-score vector simulation step, the sqrt(1 - h2 - absolute value of intercept from LDSC) term may cause NA value. This may happen when two traits share substantial overlapped samples. I take the maximum between 1 - h2 - absolute value of intercept from LDSC and 0, and the software should run well now. Please let me know if you have any other questions.

Dolce-99 commented 1 year ago

Thank you very much! The process finally ran over the sfLapply part but stopped at a latter part. It looks like a follow-up issue. Here's the error message:

Warning messages: 1: In as.numeric(strsplit(line, " ")[[1]][3]) : NAs introduced by coercion 2: In as.numeric(strsplit(line, " ")[[1]][3]) : NAs introduced by coercion 3: In as.numeric(strsplit(line, " ")[[1]][3]) : NAs introduced by coercion 4: In as.numeric(strsplit(line, " ")[[1]][3]) : NAs introduced by coercion 5: In as.numeric(strsplit(line, " ")[[1]][3]) : NAs introduced by coercion Error in if (sum(gcov != 0) > 0) { : missing value where TRUE/FALSE needed Execution halted

Let me know if you need any log files. Thanks!

ghm17 commented 1 year ago

Just fixed a bug in reading stratified-LDSC output. Please have a try on the updated software.

Dolce-99 commented 1 year ago

Run successfully, eventually! Thank you so much! Really appreciated!

ghm17 commented 1 year ago

You are welcome~