Closed char4816 closed 5 years ago
I thought my above solution would work, however I am still getting 0 results for the output after converting pvalue & or to beta & SE... I think this suggests that there may be an error in the odds_to_beta() function.
Thanks Chris, I'll look into this today and let you know what I come up with.
Quick comment: N_cases + N_controls is the value you want for N, not Neff. As noted in the link to the LDSC issue you posted, the conversion to the liability scale handles the imbalance between cases and controls.
Thank you for your help and taking the time to collaborate on this. In the ldsc issue they describe how it is roughly equivalent to 1) use N=Cases+Controls and the appropriate --samp-prev or to 2) use Neff = 4/(1/nr.cases+1/nr.controls) with --samp-prev 0.5
I am interpreting that nr.cases+nr.controls is correct as long as the user specifies prevalence values for the disease so that conversion to liability scale is accurate. Am I correct in this interpretation of the fit variables? --K1 = "population prevelance of binary phenotype 1" (prevalence of phen1 in literature) --P1 = "study prevelance of binary phenotype 1" (nr.cases/(nr.cases+nr.controls) for sfile1) --K2 = "population prevelance of binary phenotype 2" (prevalence of phen2 in literature) --P2 = "study prevelance of binary phenotype 2" (nr.cases/(nr.cases+nr.controls) for sfile2)
Yes, exactly. I'm updating the readme in the next push to clarify. Still looking into the OR/beta issue.
Thank you! I am realizing I should have said --P1 = "study prevelance of binary phenotype 1" (nr.cases/(nr.controls + nr.controls)for sfile1) --P2 = "study prevelance of binary phenotype 1" (nr.cases/(nr.controls + nr.controls)for sfile2) that way --P1 = 0.5 when nr.cases and nr.controls are balanced I'll edit the above comment to correct this
I'm struggling to find any issue in the odds ratio conversion. When you say you get the beta and SE version to work, is this the same trait? Always getting 0 tells me it could be a few things:
Otherwise, if you don't mind sharing your sumstats file (you can send me an email if you like) I can try to debug more directly.
I have a good example of summary stats files and will follow up via email in a few minutes
Fixed in the latest update. Conversion was resulting in NaNs/infs that needed to be filtered.
Thank you for making this software. I have been organizing summary stats to run through popcorn and have noticed that when I use 'beta' & 'SE', the fit returns some results (yay!). However, when I try to use the 'OR' & 'p-value' options as described in your documentation (which are the options I would prefer to use, given my summary stats files), the h2 outputs are always zero. For now I can use the 'beta' & 'SE' option, but I hope I can help you improve upon this issue so the software is even more adaptable to various sum stat formats.
A make-shift solution I found is using the equations from your odds_to_beta() function in sumstats.py to preprocess my files in R suppose you have a sumstat file with a header that is limited with the following following columns rs ref alt chr pos or pval
sumstats <- read.table("orig_sumstats.txt", header=T, stringsAsFactor=F)
sumstats$N <- 4/(1/nr.cases+1/nr.controls)
sumstats$beta <- log(sumstats$or)
sumstats$Z <- sign(sumstats$beta)*qnorm(1 - sumstats$pval/2)
sumstats$SE <- sumstats$beta/sumstats$Z
sumstats <- sumstats[,c("rs", "ref", "alt", "N", "beta", "SE")]
colnames(asthmaAdult) <- c("SNP", "a1", "a2", "N", "beta", "SE")
write.table(sumstats,"sumstats.txt", sep = "\t", col.names=T, row.names=F, quote=F)
Since many summary stats files do not have an "N" column, it can be difficult for users to distinguish the correct N to use. It doesn't always make sense to just sum nr.cases+nr.controls. Many people use the above equation for N when controls are in imbalance, for example:
The math behind the Neff calculation stems from the variance inflation factor (https://journals.sagepub.com/doi/pdf/10.1177/0163278703255230) between balanced and unbalanced studies. If you agree with me, you may want to advise users in your README to use this formula for N when it isn't provided by the summary stats and nr.cases != nr.controls
I hope this helps!
Thanks, Chris