Closed Habush closed 7 months ago
Hi,
We changed the sample genotype data to ../01_Dataset/1KG.EAS.auto.snp.norm.nodup.split.rare002.common015.missing.zip
last year. You can unzip it and replace the old path with the new one for now. The workflow is still the same (though results might be slightly different).
I will also update the finemapping part soon with the latest files. Thanks for pointing it out.
Hi,
Thanks for the quick reply! I can now get LD matrix using the updated tutorial
If you don't mind, I'd like to ask a question regarding the way LD matrix is calculated.
My question is say I've a real GWAS summary statistics data where I know the ancestry of the population on which the association test was done (e.g EAS) and I want to use 1000 Genomics as reference panel to calculate the LD. How would I use PLINK
in the same way as you did in the tutorial?. Essentially, what I'm asking is how can I generate files similar to those in ../01_Dataset/1KG.EAS.auto.snp.norm.nodup.split.rare002.common015.missing.zip
and use them as input for PLINK
?
Apologies if this is somewhat a dumb question, I'm a newbie to GWAS computations.
Thanks for your question. That's also one of the questions I had when I started to learn GWAS a few years ago. You can refer to this tutorial for how to download and convert 1kg VCF to plink format: https://www.biostars.org/p/335605/ (step1 -5) Based on that, you can perform additional steps like customized QC (e.g MAF/HWE) or extracting the samples (e.g EAS) you need using plink. Then you will get the plink files ready for your LD computations.
Thanks for sharing the link. In a similar question on the same forum, I saw a comment that says that PLINK 1.9 VCF files. Hence, I'm thinking of using the population specific files (such as 1kg_eas_hg19
) in gwaslab.check_available_ref()
dictionary and pass that to PLINK. do you think that's okay?
Interestingly, I can't reproduce the credible sets returned by susieR
after following the tutorial. Please see the attached screenshot
The only change from the tutorial is the harmonization step. In my notebook I do the harmonization as follows:
locus.harmonize(n_cores=8,
basic_check=False,
ref_seq=gl.get_path("ucsc_genome_hg19"),
ref_rsid_tsv=gl.get_path("1kg_dbsnp151_hg19_auto"),
ref_rsid_vcf=gl.get_path("dbsnp_v156_hg19"))
Whereas in the tutorial it's via
locus.harmonize(basic_check=False, ref_seq="/home/yunye/CommonData/Reference/genome/humanG1Kv37/human_g1k_v37.fasta")
I wanted to attach the Jupyter notebook here but github isn't allowing me to.
Usually finemapping requires in-sample (the samples you used to run GWAS) LD reference. But for other purpose, you can surely use the prepared VCF files from gwaslab.
From the index of variants in your results, it seems that the sumstats are different (maybe different locus??). I don't think locus.harmonize will change the results. Would you please check the logs and see from which step the results become different? Thanks.
I used the locus used in the tutorial notebook locus = sumstats.filter_value('CHR==2 & POS>55074452 & POS<56074452')
. I uploaded the Jupyter notebook I used to follow along the tutorial here
I realized that I generated the summary statistics data 1kgeas.B1.glm.firth
using the shell scripts in 06_Association_tests
folder. I did this because I couldn't find the file in the repo and perhaps I made mistake while running the scripts. If you provide me with that file, I'm happy to re-run the notebook again.
Oops, my fault. I guess I might have used the old version of the sumstats. I will double check, fix the file and let you know tomorrow. But I think you have basically finished the workflow correctly by checking you notebook. Sorry for my mistake.
Hi, I checked again but the sumstats I used were the latest one. I am wondering if you re-generate the phenotype file for GWAS? Anyway, I uploaded the sumstats 1kgeas.B1.glm.firth.gz in 12_fine_mapping. You can directly use this one.
Hi,
(Sorry for the late reply)
I ran the fine-mapping analysis with the 1kgeas.B1.glm.firth.gz file you shared above and I still get different indices than the ones in the tutorial notebook.
I am wondering if you re-generate the phenotype file for GWAS?
No, I didn't. Should I do that?
Hi, Sorry that I thought you might have re-generated the phenotype file which caused the difference. To be honest, I have no clue now. Would you please share again your notebook using 1kgeas.B1.glm.firth.gz ? I will look into it. Thanks.
I finally found the problem.
There was an error in default value check for OR in the old version.
I think just updating gwaslab to latest version would solve it.
pip install gwaslab==3.4.43
or using the old version, in basic_check(), add
sumstats.basic_check(sanitycheckstats_args={"OR":(-10,10)})
Sorry for the inconvenience.
No worries. I'm glad you were able to find the issue.
I tried to install the latest gwaslabs via pip install gwaslab==3.4.43
but pip is failing with No matching distribution found for gwaslab==3.4.43
. Perhaps, the pypi index isn't updated yet.
Hence, I added the suggested argument and ran sumstats.basic_check(sanitycheckstats_args={"OR":(-10,10)})
. Now I get one credible set with 182 200 203 206
variables in it. Not exactly the variables in the tutorial (200 218 221 224
) but very close, and I think we can safely close this issue. What do you think?
Maybe the Python version is different so that it could not be found. gwaslab now supports Python 3.9 and python3.10. For the finemapping results, I think the difference is simply due to the changed default behavior for variants checking between different versions of gwaslab. And sure, we can close it. Please let me know if you have any other questions.
Thank you for your patience and responses. I don't have further questions at moment and I'll create a new issue if I've any questions.
Thanks for creating this amazing tutorial!
In the finemapping tutoral, the script calculate_ldr.sh calculates the LD matrix from the plink file
../01_Dataset/1KG.EAS.auto.snp.norm.nodup.split.maf005.thinp020
. However, this file is not provided in the01_Dataset
folder and a script to generate it doesn't exist. Where can I find this file?