Closed ghost closed 5 years ago
Copy and paste the comment I posted in issue #30 Could you include the full log of your process? If you are using --extract or --exclude, could you please make sure those file ain't empty? There is a similar discussion here and you can find a hot fix version of PRSice here. This hot fix dealt with problem of the .valid file.
Hi Sam,
It works now. Thank you!!! One suggestion: It would be useful to add the run name to the PRSice.valid file: This way PRSice doesn't over-write when running different/multiple jobs - it would make things a bit easier to track or debug.
Cheers, all the best.
Judit
You can use the --out
parameter to specify the prefix of all output of PRSice. The default, when --out
isn't given, is PRSice
Would it make sense to keep this issue open until the hotfix is merged into a release? Or alternatively, if that hotfix already made it into a release, then I'm still having the same issues with bgen files.
Could you please show me the log file? Which version are you using? This specific problem should be fixed in the latest updates.
I'm running this now with the "hotfix" and it works fine. It does not work with the release that I downloaded from https://choishingwan.github.io/PRSice/
I am not sure how they differ, but running a binary diff, I can tell you that they do:
Binary files /home/unix/jamesp/bin/PRSice_linux and /home/unix/jamesp/lib/PRSice/PRSice_linux differ
The hotfix has the following at the start of its log:
PRSice 2.1.2.beta (31 May 2018)
https://github.com/choishingwan/PRSice
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468
The non-hotfix has the following at the start of its log:
PRSice 2.1.3.beta (21 August 2018)
https://github.com/choishingwan/PRSice
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468
And the full log from the 2.1.3beta that is not working for me with bgens. (Identifying paths and filenames have been modified; otherwise, this is intact.)
PRSice 2.1.3.beta (21 August 2018)
https://github.com/choishingwan/PRSice
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468
2018-08-21 19:08:53
PRSice_linux \
--A1 effect_allele \
--A2 noneffect_allele \
--all-score \
--bar-levels 1 \
--base pheno.txt \
--beta \
--binary-target T \
--bp bp_hg19 \
--chr chr \
--clump-kb 0 \
--clump-p 1.000000 \
--clump-r2 0.100000 \
--extract chr22.valid \
--hard-thres 0.900000 \
--info-base median_info,0.9 \
--interval 0.000050 \
--lower 0.000100 \
--model add \
--no-default \
--no-regress \
--out chr22 \
--pheno-file sample.sample \
--pvalue p_dgc \
--se se_dgc \
--seed 569377328 \
--snp markername \
--stat beta \
--target chr22 \
--thread 2 \
--type bgen \
--upper 0.500000
Loading Genotype file:
chr22
(bgen)
Detected bgen sample file format
487409 people (0 male(s), 0 female(s)) observed
487409 founder(s) included
SNP extraction/exclusion list contains 5 columns, will
assume first column contains the SNP ID
1255K SNPs processed in chr22.bgen
1576 variant(s) included
1 region included
Check Phenotype file:
sample.sample
Column Name of Sample ID: ID_1+ID_2
Note: If the phenotype file does not contain a header, the
column name will be displayed as the Sample ID which is ok.
Phenotype Name: missing
There are a total of 1 phenotype to process
Start processing pheno
==============================
Reading 100.00%
Base file: pheno.txt
9455778 variant(s) observed in base file, with:
9455778 variant(s) not found in target file
0 total variant(s) included from base file
Error: No valid variant remaining
Can you check if your base file actually contain those SNP IDs? On Wed, 22 Aug 2018 at 12:34 AM, James Pirruccello notifications@github.com wrote:
And the full log from the 2.1.3beta that is not working for me with bgens:
PRSice 2.1.3.beta (21 August 2018) https://github.com/choishingwan/PRSice (C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly GNU General Public License v3
If you use PRSice in any published work, please cite: Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015) PRSice: Polygenic Risk Score software. Bioinformatics 31 (9): 1466-1468
2018-08-21 19:08:53 PRSice_linux \ --A1 effect_allele \ --A2 noneffect_allele \ --all-score \ --bar-levels 1 \ --base pheno.txt \ --beta \ --binary-target T \ --bp bp_hg19 \ --chr chr \ --clump-kb 0 \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --extract chr22.valid \ --hard-thres 0.900000 \ --info-base median_info,0.9 \ --interval 0.000050 \ --lower 0.000100 \ --model add \ --no-default \ --no-regress \ --out chr22 \ --pheno-file sample.sample \ --pvalue p_dgc \ --se se_dgc \ --seed 569377328 \ --snp markername \ --stat beta \ --target chr22 \ --thread 2 \ --type bgen \ --upper 0.500000
Loading Genotype file: chr22 (bgen)
Detected bgen sample file format 487409 people (0 male(s), 0 female(s)) observed 487409 founder(s) included
SNP extraction/exclusion list contains 5 columns, will assume first column contains the SNP ID
1255K SNPs processed in chr22.bgen 1576 variant(s) included
1 region included
Check Phenotype file: sample.sample Column Name of Sample ID: ID_1+ID_2 Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is ok. Phenotype Name: missing There are a total of 1 phenotype to process
Start processing pheno
Reading 100.00% Base file: pheno.txt 9455778 variant(s) observed in base file, with: 9455778 variant(s) not found in target file 0 total variant(s) included from base file
Error: No valid variant remaining
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/69#issuecomment-414855564, or mute the thread https://github.com/notifications/unsubscribe-auth/ABM44vN9ria-PojklJ77T7ZuTmB9OL6Pks5uTJjrgaJpZM4VTLGn .
Yes - the May hotfix is running fine on the ~60,000 variants that overlap this GWAS data from chromosome 22 in my bgens. The August 21 version does not seem to work for me.
Do you still have the log file from May? Here, it suggested that there’s only around 1500 SNPs left after filtering. The main difference between the May version and the August version is that the info filtering and MAF filtering, so SNPs should be filtered out correctly On Wed, 22 Aug 2018 at 12:49 AM, James Pirruccello notifications@github.com wrote:
Yes - the May hotfix is running fine on the ~60,000 variants that overlap this GWAS data from chromosome 22 in my bgens. The August 21 version does not seem to work for me.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/69#issuecomment-414858482, or mute the thread https://github.com/notifications/unsubscribe-auth/ABM44mGCxaMYz2zH8kZukOS_LXVUtcg4ks5uTJyYgaJpZM4VTLGn .
The May version (i.e., the hotfix which I just downloaded from your Dropbox) is still in the process of running:
PRSice 2.1.2.beta (31 May 2018)
https://github.com/choishingwan/PRSice
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468
2018-08-21 19:30:50
PRSice_linux \
--A1 effect_allele \
--A2 noneffect_allele \
--all-score \
--bar-levels 1 \
--base pheno.txt \
--beta \
--binary-target T \
--bp bp_hg19 \
--chr chr \
--info-base median_info,0.9 \
--interval 0.000050 \
--lower 0.000100 \
--model add \
--no-clump \
--no-default \
--no-regress \
--out chr22 \
--pheno-file sample.sample \
--pvalue p_dgc \
--se se_dgc \
--seed 3898643461 \
--snp markername \
--stat beta \
--target chr22 \
--thread 2 \
--type bgen \
--upper 0.500000
Loading Genotype file:
chr22
(bgen)
Detected bgen sample file format
487409 people (0 male(s), 0 female(s)) observed
487409 founder(s) included
1255K SNPs processed in chr22.bgen
Error: A total of 4263 duplicated SNP ID detected out of
1082409 input SNPs!. Valid SNP ID stored at chr22.valid.
You can avoid this error by using --extract chr22.valid
PRSice 2.1.2.beta (31 May 2018)
https://github.com/choishingwan/PRSice
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468
2018-08-21 19:31:12
PRSice_linux \
--A1 effect_allele \
--A2 noneffect_allele \
--all-score \
--bar-levels 1 \
--base pheno.txt \
--beta \
--binary-target T \
--bp bp_hg19 \
--chr chr \
--extract chr22.valid \
--info-base median_info,0.9 \
--interval 0.000050 \
--lower 0.000100 \
--model add \
--no-clump \
--no-default \
--no-regress \
--out chr22 \
--pheno-file sample.sample \
--pvalue p_dgc \
--se se_dgc \
--seed 3104147554 \
--snp markername \
--stat beta \
--target chr22 \
--thread 2 \
--type bgen \
--upper 0.500000
Loading Genotype file:
chr22
(bgen)
Detected bgen sample file format
487409 people (0 male(s), 0 female(s)) observed
487409 founder(s) included
1255K SNPs processed in chr22.bgen
1074860 variant(s) included
1 region included
Start processing pheno
==============================
Reading 100.00%
Base file: pheno.txt
9455778 variant(s) observed in base file, with:
3 ambiguous variant(s) excluded
9358122 variant(s) not found in target file
1115 mismatched variant(s) excluded
30641 variant(s) with INFO score less than 0.900000
66256 total variant(s) included from base file
Warning: Mismatched SNPs detected between base and
target!You should check the files are based on the same
genome build, or that can just be InDels
Check Phenotype file:
sample.sample
Column Name of Sample ID: ID_1+ID_2
Note: If the phenotype file does not contain a header, the
column name will be displayed as the Sample ID which is ok.
Phenotype Name: missing
There are a total of 1 phenotype to process
Processing the 1 th phenotype
Processing 0.23%
Strange, can you check the valid file is the same for both version? For he August version it seems to only contain 1500 SNPs where’s for the May version it contains much more
(Won’t be able to reply after this email until tomorrow morning ) On Wed, 22 Aug 2018 at 1:00 AM, James Pirruccello notifications@github.com wrote:
The May version (i.e., the hotfix which I just downloaded from your Dropbox) is still in the process of running:
PRSice 2.1.2.beta (31 May 2018) https://github.com/choishingwan/PRSice (C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly GNU General Public License v3
If you use PRSice in any published work, please cite: Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015) PRSice: Polygenic Risk Score software. Bioinformatics 31 (9): 1466-1468
2018-08-21 19:30:50 PRSice_linux \ --A1 effect_allele \ --A2 noneffect_allele \ --all-score \ --bar-levels 1 \ --base pheno.txt \ --beta \ --binary-target T \ --bp bp_hg19 \ --chr chr \ --info-base median_info,0.9 \ --interval 0.000050 \ --lower 0.000100 \ --model add \ --no-clump \ --no-default \ --no-regress \ --out chr22 \ --pheno-file sample.sample \ --pvalue p_dgc \ --se se_dgc \ --seed 3898643461 \ --snp markername \ --stat beta \ --target chr22 \ --thread 2 \ --type bgen \ --upper 0.500000
Loading Genotype file: chr22 (bgen)
Detected bgen sample file format 487409 people (0 male(s), 0 female(s)) observed 487409 founder(s) included
1255K SNPs processed in chr22.bgen Error: A total of 4263 duplicated SNP ID detected out of 1082409 input SNPs!. Valid SNP ID stored at chr22.valid. You can avoid this error by using --extract chr22.valid
PRSice 2.1.2.beta (31 May 2018) https://github.com/choishingwan/PRSice (C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly GNU General Public License v3
If you use PRSice in any published work, please cite: Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015) PRSice: Polygenic Risk Score software. Bioinformatics 31 (9): 1466-1468
2018-08-21 19:31:12 PRSice_linux \ --A1 effect_allele \ --A2 noneffect_allele \ --all-score \ --bar-levels 1 \ --base pheno.txt \ --beta \ --binary-target T \ --bp bp_hg19 \ --chr chr \ --extract chr22.valid \ --info-base median_info,0.9 \ --interval 0.000050 \ --lower 0.000100 \ --model add \ --no-clump \ --no-default \ --no-regress \ --out chr22 \ --pheno-file sample.sample \ --pvalue p_dgc \ --se se_dgc \ --seed 3104147554 \ --snp markername \ --stat beta \ --target chr22 \ --thread 2 \ --type bgen \ --upper 0.500000
Loading Genotype file: chr22 (bgen)
Detected bgen sample file format 487409 people (0 male(s), 0 female(s)) observed 487409 founder(s) included
1255K SNPs processed in chr22.bgen 1074860 variant(s) included
1 region included
Start processing pheno
Reading 100.00% Base file: pheno.txt 9455778 variant(s) observed in base file, with: 3 ambiguous variant(s) excluded 9358122 variant(s) not found in target file 1115 mismatched variant(s) excluded 30641 variant(s) with INFO score less than 0.900000 66256 total variant(s) included from base file
Warning: Mismatched SNPs detected between base and target!You should check the files are based on the same genome build, or that can just be InDels
Check Phenotype file: sample.sample Column Name of Sample ID: ID_1+ID_2 Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is ok. Phenotype Name: missing There are a total of 1 phenotype to process
Processing the 1 th phenotype Processing 0.23%
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/69#issuecomment-414860598, or mute the thread https://github.com/notifications/unsubscribe-auth/ABM44vvcQqoyb_stje21c4m52JUy9vT2ks5uTJ81gaJpZM4VTLGn .
I can confirm that it is the same file, in particular because I am running this with a bash script and I just swapped out the PRSice binary. Otherwise, exact same files.
Edit: I might have pasted a second-run, after the first run did screening. So the files are the same, but the filter might be different. Will need to get back to you, as I’m no longer at the computer tonight.
If you don't mind, could you please post the number of line of the valid file generated from the different build? I am now working on writing up unit test for PRSice and hopefully if there're any problem, I can capture them. Thanks
Unfortunately, I didn't end up needing the tool to run to completion, so I don't have a full answer.
However, on a complete run, I got 0 lines of output from the August version. In contrast, the May hotfix version seemed to be producing valid output at every expected site. (I only let it get ~0.5% of the way complete because I ended up deferring this analysis for something else that came up.)
Hi Sam, Pls see the PRSice error below:
Error: No valid variant remaining
I checked that:
LOG FILE:
2019-03-13 22:11:11
Rscript PRSice.R
--prsice /usr/local/bin/PRSice \
--A1 Effect_allele \
--A2 Non_Effect_allele \
--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
--base GWAS_summary_1.txt \
--beta \
--binary-target T \
--bp Position \
--chr Chromosome \
--clump-kb 250 \
--clump-p 1.000000 \
--clump-r2 0.100000 \
--interval 5e-05 \
--lower 0.0001 \
--model add \
--out my_results \
--pvalue Pvalue \
--se SE \
--seed 835715086 \
--snp MarkerName \
--stat Beta \
--target my_plink_input \
--thread 6 \
--upper 0.5
Loading Genotype file: my_plink_input (bed)
1201 people (693 male(s), 508 female(s)) observed 1201 founder(s) included
2835792 ambiguous variant(s) excluded 16115311 variant(s) included
1 region included
There are a total of 1 phenotype to process
Start processing GWAS_summary_1 ==============================
Reading 100.00% Base file: GWAS_summary_1.txt 7055881 variant(s) observed in base file, with: 7055881 variant(s) not found in target file 0 total variant(s) included from base file
Error: No valid variant remaining
Error: Execution halted
Thank you.
While the SNPs might have the same position, as long as their variant ID doesn't match, they will be counted as missing. It is possible that your base and target use a different naming system for their SNPs
Hi Sam, I am facing the same issue of no variants remaining. Please find the log below: RSice 2.1.2.beta (31 May 2018) https://github.com/choishingwan/PRSice (C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly GNU General Public License v3
If you use PRSice in any published work, please cite: Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015) PRSice: Polygenic Risk Score software. Bioinformatics 31 (9): 1466-1468
2019-03-25 16:26:40 ./PRSice_linux \ --A1 A1 \ --A2 A2 \ --all-score \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base /home/cnap_lab/Mehul_prs_25032019/PRSice_linux/glgc_25032019.assoc \ --beta \ --binary-target F \ --bp BP \ --chr CHR \ --extract /home/cnap_lab/Mehul_prs_25032019/prs_illumina_25032019.valid \ --info-base INFO,0.9 \ --interval 0.000050 \ --keep-ambig \ --lower 0.000100 \ --model add \ --no-clump \ --no-regress \ --out /home/cnap_lab/Mehul_prs_25032019/prs_illumina_25032019 \ --perm 10000 \ --print-snp \ --pvalue P \ --seed 2628881950 \ --snp SNP \ --stat BETA \ --target /mnt/Data/Genotype_DATA_cnap_lab/hrc_GODARTS/affy_hrc/affy6b37_GD20062016forimp \ --thread 1 \ --type bed \ --upper 0.500000
Loading Genotype file: /mnt/Data/Genotype_DATA_cnap_lab/hrc_GODARTS/affy_hrc/affy6b37_GD20062016forimp (bed)
3884 people (0 male(s), 0 female(s)) observed 3884 founder(s) included
8212 ambiguous variant(s) kept 64845 variant(s) included
1 region included
Start processing glgc_25032019 ==============================
Base file: /home/cnap_lab/Mehul_prs_25032019/PRSice_linux/glgc_25032019.assoc 44 variant(s) observed in base file, with: 44 variant(s) not found in target file 0 total variant(s) included from base file
Error: No valid variant remaining I look forward to your reply Thanks
Could you please try using 2.1.9? Also, with only 44 variants in base, it is highly possible for none of the SNPs be found within the target dataset
Hi Sam, Thanks for reply. I have tried with 2.1.9. It worked, if I use individual files. I am able to get scores chromosome wise. But, When I gave command to run all files with #. It said 'Killed' Error : Execution halted. Please find the log file and suggest. Thanks prs_illumina_26032019.log
That’d most likely due to lack of memory. For example, with UKBB data, you will need around 40Gb of memory to process the files On Tue, 26 Mar 2019 at 1:15 PM, mehul4frnds notifications@github.com wrote:
Hi Sam, Thanks for reply. I have tried with 2.1.9. It worked, if I use individual files. I am able to get scores chromosome wise. But, When I gave command to run all files with #. It said 'Killed' Error : Execution halted. Please find the log file and suggest. Thanks prs_illumina_26032019.log https://github.com/choishingwan/PRSice/files/3009358/prs_illumina_26032019.log
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/69#issuecomment-476754490, or mute the thread https://github.com/notifications/unsubscribe-auth/ABM44kDE08Wm9sx78JkfI2qH23_WSPhZks5valWwgaJpZM4VTLGn .
Following previous issue #30
I am using BGEN and I am having the same issue:
6330995 variant(s) observed in base file, with: 6330995 variant(s) not found in target file 0 total variant(s) included from base file Error: No valid variant remaining
I've checked internally and I have matching rs numbers and positions. I downloaded PRSice the 21st May. So not sure whether you fixed this before such date?
Thanks!