Open alesss78 opened 5 years ago
Hi, 1. 1000G_Phase3_baselineLD_v1.1_ldscores.tgz was the one used in my paper. The differences between each of this files is described in the README file in https://data.broadinstitute.org/alkesgroup/LDSCORE/readme_baseline_versions
Which model did you use? It is expected to see a moderate difference in prediction R2 between the models you use depending on the set of annotations that you are using (see Table S8 from my manuscript).
Yes, you can use h2 from the log file.
Hi Carla Thank you for you response. I am experiencing a memory error while trying to run LDpred-funct. I am using imputed data (about 90 milion variant, unfiltered) of about 400000 individuals and I run Ldpred-funct on a Google machine with 16 cores and 512 gigas or Ram.
Importantly: At the moment of the memory error, Ldpred-funct was using only 271 gigas out of the 512 gigas available. Additionally, the memory error occurs regardless of the training sample parameter (Ldpred-funct Input parameter N.) value used: I tried both N=400000 and N=10
The other inputs of LDpred-funct are:
Ldpred-funct generates a memory error problem while calculating the PRS for Chromosome 1.
We have installed numpy version 1.14.5 and scipy version 1.2.1**
I would kindly ask for your help to overcome this problem. The logs of Ldpred-funct are as following:
{'ld_radius': None, 'verbose': False, 'skip_coordination': False, 'K': None, 'posterior_means': '/opt/LDpred-funct/output/ldpredfunct_posterior_means', 'H2': 0.0686, 'coord': '/ldpred-morespace/coordinate/Coord_Final', 'N': 10, 'gmdir': None, 'gf': '/opt/LDpred-funct/ldpred-links/Final.chrom[1:22]', 'chisq': False, 'pf': '/opt/LDpred-funct/pheno', 'skip_ambiguous': False, 'maf': 0.01, 'ssf': '/opt/LDpred-funct/statfile', 'FUNCT_FILE': '/opt/LDpred-funct/funcfile', 'check_mafs': False, 'out': '/opt/LDpred-funct/output/ldpredfunct_prs'} Step 1: Coordinate summary statistics, genotype and functional enrichments files.
Parsing SNP list Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom1.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom2.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom3.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom4.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom5.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom6.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom7.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom8.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom9.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom10.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom11.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom12.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom13.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom14.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom15.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom16.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom17.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom18.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom19.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom20.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom21.bim Parsing bim file: /opt/LDpred-funct/ldpred-links/Final.chrom22.bim Total 92646145 SNPs in all genotypes files. Parsing the SNP-Heritability file: /opt/LDpred-funct/funcfile rs575272151 3.81811453166102e-06
Get intersection between SNPs in genotypes file and functional enrichments file.
9946704 found in both files
Parsing the summary statistics file: /opt/LDpred-funct/statfile
0 SNPs with p-value rounded to zero in summary statistics file
SS file loaded, now sorting and storing in HDF5 file.
505424 SNPs on chromosome 1
Still 505424 SNPs on chromosome 1
546373 SNPs on chromosome 2
Still 546373 SNPs on chromosome 2
467098 SNPs on chromosome 3
Still 467098 SNPs on chromosome 3
476461 SNPs on chromosome 4
Still 476461 SNPs on chromosome 4
416892 SNPs on chromosome 5
Still 416892 SNPs on chromosome 5
415593 SNPs on chromosome 6
Still 415593 SNPs on chromosome 6
381258 SNPs on chromosome 7
Still 381258 SNPs on chromosome 7
357813 SNPs on chromosome 8
Still 357813 SNPs on chromosome 8
276716 SNPs on chromosome 9
Still 276716 SNPs on chromosome 9
336863 SNPs on chromosome 10
Still 336863 SNPs on chromosome 10
325501 SNPs on chromosome 11
Still 325501 SNPs on chromosome 11
310625 SNPs on chromosome 12
Still 310625 SNPs on chromosome 12
245054 SNPs on chromosome 13
Still 245054 SNPs on chromosome 13
214610 SNPs on chromosome 14
Still 214610 SNPs on chromosome 14
184209 SNPs on chromosome 15
Still 184209 SNPs on chromosome 15
192858 SNPs on chromosome 16
Still 192858 SNPs on chromosome 16
169921 SNPs on chromosome 17
Still 169921 SNPs on chromosome 17
190234 SNPs on chromosome 18
Still 190234 SNPs on chromosome 18
146027 SNPs on chromosome 19
Still 146027 SNPs on chromosome 19
148753 SNPs on chromosome 20
Still 148753 SNPs on chromosome 20
91425 SNPs on chromosome 21
Still 91425 SNPs on chromosome 21
89999 SNPs on chromosome 22
Still 89999 SNPs on chromosome 22
6489707 SNPs parsed from summary statistics file.
Coordinating summary statistics and genotypes
Unable to find phenotype values.
Working on chromsome: chrom_1
Bim file information loaded
Found 505424 SNPs present in both genotype and summary statistics datasets
0 SNPs were excluded due to ambiguous nucleotides.
1 SNPs were excluded due to non-matching nucleotides.
Iterating over file to genotypes
13171 SNPs with MAF < 0.010 were filtered
492252 SNPs were retained on chromosome 1.
Traceback (most recent call last):
File "/opt/LDpred-funct/ldpredfunct.py", line 163, in
Thank you in advance for your time!
I am trying to use LDpred-funct in order to generate a polygenic risk score from a summary statistic. I am following the tutorial reported here but I have some questions:
At first, I used S-LDSC in order to calculate regression coefficients. The github tutorial page of S-LDSC suggests to use as baseline model LD scores the ones in the file: 1000G_Phase1_baseline_ldscores.tgz downloaded from https://data.broadinstitute.org/alkesgroup/LDSCORE/
I managed to run S-LDSC and to obtain the result files. However, at the point of calculating the per -SNP heritability, present tutorial says to use baselineLD.*.annot.gz files, in order to extract the so called X matrices.
1) From which compressed tgz file, baselineLD.*.annot.gz files have to be taken? I see different possibilities in the download page: 1000G_Phase3_baselineLD_ldscores.tgz 1000G_Phase3_baselineLD_v1.1_ldscores.tgz, 1000G_Phase3_baselineLD_v2.0_ldscores.tgz, 1000G_Phase3_baselineLD_v2.1_ldscores.tgz
2) In the S-LDSC step when calculating the regression coefficients, I used a different baseline model LD file (baselineLD.*.annot.gz) than those reported above. Is this correct or possibly I should have used one of the baselineLD file above? (in case, which one?)
3) The tutorial says to divide regression coefficients by the parameter h2g, the latter being taken from the .log file generated by S-LDSC. In the .log file I don't see any h2g parameter, but rather a h2 parameter. Is that the correct one to be taken?
Thank you in advance for the answers to these questions