Closed ShoaibBME closed 1 year ago
Hi-- It looks that you only have input data from one population? In this case PRS-CS might be the tool you'd like to use. Using PRS-CSx for one input population is also fine and should be equivalent to PRS-CS. I think the issue here was that the directory to the reference panel was not correctly specified -- you only need to specify the path to the directory that contains the SNP information file and LD reference panels and don't need to point to the reference of a specific population. This is slightly different from how reference is specified in PRS-CS.
Hi Tian, Thanks for prompt response, really appreciate it. PRS-CS seems working in my case as I have european GWAS summary stats and genotyping data of europeans. My group is wondering if we can apply this tool, in case we want to calculate PRS in south asian and east asian individuals as well. The important point here, we only have summary stats of EUR ancestry and we want to calculate PRS in South Asian ppl in addition to europeans. You think, this tool can be used as it is based in LD reference panels from different ancestries. If this can be done, with a single GWAS how accurate PRS in other ancestries would be, This was the point of our main discussion in our group meeting today. Can you please send paper, where someone used PRScsx with just one summary stats and calculated PRS in two or more population. Thanks so much for your amazing work
Hi - The main difference between PRS-CS and PRS-CSx is that PRS-CS can only be applied to GWAS summary statistics from one population while PRS-CSx can integrate summary statistics from multiple populations. If you only have European summary statistics and would like to evaluate the score in non-European populations, you can use PRS-CS. It's a valid analysis although the PRS is expected to be less accurate in non-European samples. Applying PRS-CSx to the European summary statistics in this case doesn't make the score more portable; in fact PRS-CSx reduces to PRS-CS when the input is from a single population.
Hi Tian, when I am using single GWAS summary stats, single LD ref panel, PRScs is working well, but I am continuously getting error when I am using PRScsx, please check what is the potential problem, I have correctly mentioned the ref directory where the ref panel and snp_info from 1000 genome is available, still unable to find the cause of error when using PRScsx
python PRScsx.py --ref_dir=/hpf/projects/arnold/users/Shoaib2/ldblk_1kg_eas --bim_prefix=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_EAS_recall/imputation/beagle/v0.1.4/plink_hard_calls/osc_eas_beagle.hardcalls --sst_file=/hpf/projects/arnold/users/Shoaib2/sumstatsn.txt --n_gwas=38691 --pop=EAS --chrom=21 --out_dir=/hpf/projects/arnold/users/Shoaib2/ --out_name=PRS
--ref_dir=/hpf/projects/arnold/users/Shoaib2/ldblk_1kg_eas --bim_prefix=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_EAS_recall/imputation/beagle/v0.1.4/plink_hard_calls/osc_eas_beagle.hardcalls --sst_file=['/hpf/projects/arnold/users/Shoaib2/sumstatsn.txt'] --a=1 --b=0.5 --phi=None --n_gwas=[38691] --pop=['EAS'] --n_iter=1000 --n_burnin=500 --thin=5 --out_dir=/hpf/projects/arnold/users/Shoaib2/ --out_name=PRS --chrom=['21'] --meta=FALSE --seed=None
only 1 discovery population detected
... parse bim file: /hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_EAS_recall/imputation/beagle/v0.1.4/plink_hard_calls/osc_eas_beagle.hardcalls.bim ...
... 103373 SNPs on chromosome 21 read from /hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_EAS_recall/imputation/beagle/v0.1.4/plink_hard_calls/osc_eas_beagle.hardcalls.bim ...
Traceback (most recent call last):
File "PRScsx.py", line 215, in
Hi - I think the problem is still the specification of the reference panel. In --ref_dir you only need to point to a directory that contains snpinfo_mult_1kg_hm3 and ldblk_1kg_eur, ldblk_1kg_eas, etc. You don't need to specify ldblk_1kg_eas in --ref_dir. The population of the input summary stats is specified using --pop in PRS-CSx. When you use --pop=EAS, the program will look for ldblk_1kg_eas within the directory you specified by --ref_dir. This is slightly different from how reference directory is specified in PRS-CS. Let me know if this resolves your issue.
You can try to use --ref_dir=/hpf/projects/arnold/users/Shoaib2 and put snpinfo_mult_ukbb_hm3 into Shoaib2 folder.
Hi, I am also experiencing a similar issue. I am trying to run the script on one chromosome to try it.
... parse bim file: /home/PRScx/Target/filtered_callRate_Chr21.bim ...
... 109196 SNPs on chromosome 21 read from /home/PRScx/Target/filtered_callRate_Chr21.bim ...
Traceback (most recent call last):
File "/home/PRScx/PRScsx-master/PRScsx.py", line 215, in
I specify
export MKL_NUM_THREADS=$N_THREADS export NUMEXPR_NUM_THREADS=$N_THREADS export OMP_NUM_THREADS=$N_THREADS
chr=21 python $codeDir/PRScsx.py --ref_dir=$refDir --bim_prefix=$targetDir/filtered_callRate_Chr${chr} --sst_file=$discoveryDir/sumstats1.csv,$discoveryDir/sumstats2.csv --n_gwas=8006,788989 --pop=EUR,AFR --out_dir=$outputDir/PRS --out_name=PRS_Chr${chr} --phi=1e-2 --n_iter=2000 --n_burnin=1000 --thin=5 --chrom=${chr} --seed=1234
I would appreciate your advice. Thanks
Hi - There seems to be a problem when specifying the directory of the reference panel. Could you double check that the directory for --ref_dir contains snpinfo_mult_1kg_hm3 and population-specific reference panels (i.e., ldblk_1kg_eur, ldblk_1kg_eas, ldblk_1kg_afr, etc)?
Hi, I am also experiencing same error while running PRScsx.
Traceback (most recent call last):
File "/bin2/PRScsx", line 215, in
Even I specified the directory of the reference panel (snpinfo_mult_1kg_hm3,ldblk_1kg_eur, ldblk_1kg_eas) correctly, I still got same error. Is there any other factor causing this error except wrong specification of the directory?
Hi - Could you send the commend line that you used to call PRS-CSx?
Hello @getian107, here is my command line:
python3 /home2/ldgbsr/UA_PRS_project/Code/Tools/PRScsx/PRScsx.py --ref_dir=/data6/PRScsx_data/PRScsx-tmp/1000G --bim_prefix=/home2/ldgbsr/UA_PRS_project/Analysis/BestFit/BFset_CAXA_allQCed --sst_file=/home2/ldgbsr/UA_PRS_project/Analysis/PRScsx/UKB_UA.001.sumstats,/home2/ldgbsr/UA_PRS_project/Analysis/PRScsx/JPB2021_UA.001.sumstats --pop=EUR,EAS --n_gwas=437354,129405 --phi=1e-2 --chrom=1 --out_dir=/home2/ldgbsr/UA_PRS_project/Analysis/PRScsx/ --out_name=eas_eur_ua_beta
Output: --ref_dir=/data6/PRScsx_data/PRScsx-tmp/1000G --bim_prefix=/home2/ldgbsr/UA_PRS_project/Analysis/BestFit/BFset_CAXA_allQCed --sst_file=['/home2/ldgbsr/UA_PRS_project/Analysis/PRScsx/UKB_UA.001.sumstats', '/home2/ldgbsr/UA_PRS_project/Analysis/PRScsx/JPB2021_UA.001.sumstats'] --a=1 --b=0.5 --phi=0.01 --n_gwas=[437354, 129405] --pop=['EUR', 'EAS'] --n_iter=2000 --n_burnin=1000 --thin=5 --out_dir=/home2/ldgbsr/UA_PRS_project/Analysis/PRScsx/ --out_name=eas_eur_ua_beta --chrom=['1'] --meta=FALSE --seed=None
2 discovery populations detected
... parse bim file: /home2/ldgbsr/UA_PRS_project/Analysis/BestFit/BFset_CAXA_allQCed.bim ...
... 318938 SNPs on chromosome 1 read from /home2/ldgbsr/UA_PRS_project/Analysis/BestFit/BFset_CAXA_allQCed.bim ...
Traceback (most recent call last):
File "/home2/ldgbsr/UA_PRS_project/Code/Tools/PRScsx/PRScsx.py", line 154, in
Hi- Do you mind showing a list of the files under /data6/PRScsx_data/PRScsx-tmp/1000G
1000G reference directory contains the following directories: ldblk_1kg_afr ldblk_1kg_amr ldblk_1kg_eas ldblk_1kg_eur ldblk_1kg_sas Then, each directories is consisted of following files: ldblk_1kg_chr10.hdf5 ldblk_1kg_chr16.hdf5 ldblk_1kg_chr21.hdf5 ldblk_1kg_chr6.hdf5 ldblk_1kg_chr11.hdf5 ldblk_1kg_chr17.hdf5 ldblk_1kg_chr22.hdf5 ldblk_1kg_chr7.hdf5 ldblk_1kg_chr12.hdf5 ldblk_1kg_chr18.hdf5 ldblk_1kg_chr2.hdf5 ldblk_1kg_chr8.hdf5 ldblk_1kg_chr13.hdf5 ldblk_1kg_chr19.hdf5 ldblk_1kg_chr3.hdf5 ldblk_1kg_chr9.hdf5 ldblk_1kg_chr14.hdf5 ldblk_1kg_chr1.hdf5 ldblk_1kg_chr4.hdf5 snpinfo_1kg_hm3 ldblk_1kg_chr15.hdf5 ldblk_1kg_chr20.hdf5 ldblk_1kg_chr5.hdf5
I think you need to download the SNP information file for PRS-CSx and put it in the 1000G directory.
Thank you for the solution, @getian107. It solved my error. I was using the file named snpinfo_1kg_hm3 instead of snpinfo_mult_1kg_hm3, so it seems like caused the error. Also, I am curious about whether I need to use only SNPs in sumstats that are higher than certain MAF and INFO thresholds. If so, what are MAF and INFO threshold requirements?
The reference panels have filtered to HapMap3 variants with MAF >1% (and INFO >0.8 for the UK Biobank reference). You can further apply MAF and INFO filtering in your target dataset follow standard GWAS QC procedures. Filtering of SNPs in the summary statistics is usually unnecessary.
Hi I am trying to apply PRScsx using my genotyping data, this is how my code looks like, please help me to solve this issue
python PRScsx/PRScsx.py --ref_dir=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/ldblk_1kg_eur/ldblk_1kg_eur --bim_prefix=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/osc.QC1 --sst_file=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/sumstatsn.txt --n_gwas=38691 --pop=EUR --chrom=21 --out_dir=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test --out_name=PRS
--ref_dir=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/ldblk_1kg_eur/ldblk_1kg_eur --bim_prefix=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/osc.QC1 --sst_file=['/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/sumstatsn.txt'] --a=1 --b=0.5 --phi=None --n_gwas=[38691] --pop=['EUR'] --n_iter=1000 --n_burnin=500 --thin=5 --out_dir=/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test --out_name=PRS --chrom=['21'] --meta=FALSE --seed=None
only 1 discovery population detected
process chromosome 21
... parse bim file: /hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/osc.QC1.bim ... ... 118685 SNPs on chromosome 21 read from /hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/current/Shoaib/test/osc.QC1.bim ... Traceback (most recent call last): File "/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/v0.1.3/Shoaib/test/PRScsx/PRScsx.py", line 215, in
main()
File "/hpf/projects/arnold/data/genotypes/OSC_HumanCoreExome_recall/imputation/beagle/v0.1.3/Shoaib/test/PRScsx/PRScsx.py", line 198, in main
sst_dict[pp] = parse_genet.parse_sumstats(ref_dict, vld_dict, param_dict['sst_file'][pp], param_dict['pop'][pp], param_dict['n_gwas'][pp])
^^^^^^^^
UnboundLocalError: cannot access local variable 'ref_dict' where it is not associated with a value