Closed zh-zhang1984 closed 2 years ago
Hey,
I believe if you tried to calculate N for your sumstats using MSS you would have gotten the below warning:
WARNING: Neff column could not be calculated as the columns N_CAS & N_CON were not found in the dataset
This is because a case and control N value per SNP is necessary to calculate N this way. Since this data isn't available it can't be imputed. I think your best bet is to contact the authors of the GWAS to get (at the very least) a population N number to add to the data so you can run ldsc (a N value per SNP would be better)
Thank you for your hints; then is there a method to incorporate this N into the function pipeline; Suppose I get the N from literature / authors , and want to include N in the MungeSumstats::import_sumstats
Yep so you can use the compute_n
parameter by setting it to the integer amount for N and the column will be created but as I mentioned this should be as a last resort since it isn't necessarily the true N for each SNP so you will lose precision. See the parameter documentation:
@param compute_n Whether to impute N. Default of 0 won't impute, any other integer will be imputed as the N (sample size) for every SNP in the dataset. **Note** that imputing the sample size for every SNP is not correct and should only be done as a last resort. N can also be inputted with "ldsc", "sum", "giant" or "metal" by passing one of these for this field or a vector of multiple. Sum and an integer value creates an N column in the output whereas giant, metal or ldsc create an Neff or effective sample size. If multiples are passed, the formula used to derive it will be indicated.
Closing as I believe your question has been answered. Feel free to reopen if not.
Thanks, Alan.
Hi, everyone I use the following to format the VCF file for ldsc analysis, however, I found there is no N column; Can anyone help me?