Open aheritas opened 2 years ago
You want to also add --no-regress. As you don't need to do the regression to optimize parameter.
On Mon, Sep 12, 2022, 11:05 AM aheritas @.***> wrote:
Hi! I have been reading the documentation and some of your answers in forums. I understand that it is possible to calculate the PRS score for a single individual using the PGS Catalog files. I have tried to do so using PRSice-2 but I have been unsuccessful. I am sharing here my detailed steps and I would be grateful if you could guide me into how to troubleshoot this.
I want to calculate the PRS for breast cancer (PGS000004 https://www.pgscatalog.org/score/PGS000004/). I know the fact that, the scoring file for this particular PRS, does not include RSIDs but genomic positions (I am using this harmonized file https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000004/ScoringFiles/Harmonized/PGS000004_hmPOS_GRCh37.txt.gz for GRCh37). According to some of your answers in a forum https://www.biostars.org/p/9463113/, when using PGS Catalog files, we should add an additional column including all 1 or 0 as p-values. I modify this file to include a new column, (named p_value) that contains all 1 resulting in PGS000004_withpval.txt https://github.com/choishingwan/PRSice/files/9549230/PGS000004_withpval.txt .
My input file is a a VCF obtained from imputation software, containing approx. 80M variants. The first I do is to normalize this VCF using bcftools, so that there is one single row per genomic position.
bcftools norm -m +any -O z -o NORMALIZED_VCF /home/user/data/ORIGINALFILEVCF_imputed.vcf.gz
Then, I transform this file into the necessary input files for PRSice (.bed, .bim, .fam) using PLINK v1.9.
plink --vcf /home/user/data/NORMALIZED_VCF.vcf.gz --snps-only --make-bed --out NORM_PLINK_VCF
Finally, I run PRSice, with the following parameters:
Rscript /home/user/data/PRSice.R --prsice /home/user/data/PRSice_linux --base /home/user/data/PGS000004_withpval.txt --a1 effect_allele --a2 other_allele --stat effect_weight --pvalue p_value --beta --bp chr_position --chr chr_name --chr-id c:l-ab --target NORM_PLINK_VCF --no-clump --out Output_NORM_PLINK_VCF_PRSice The script runs, but I get the following error:
81192144 variant(s) not found in previous data 237 variant(s) included
There are a total of 1 phenotype to process
Processing the 1 th phenotype
Phenotype is a continuous phenotype
Only one phenotype value detected and they are all -9. Not enough valid phenotype
So, I understand there is a problem with the phenotype file. The phenotype of this file, is unknown, that's why I want to calculate the PRS, but perhaps I am incorrectly adding some extra parameters that are not necessary. Would you mind guiding me to this calculation? Thank you very much!
— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/302, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYQCJASR77SN3KUTMO3V55BDXANCNFSM6AAAAAAQKSAPAY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi! I have been reading the documentation and some of your answers in forums. I understand that it is possible to calculate the PRS score for a single individual using the PGS Catalog files. I have tried to do so using PRSice-2 but I have been unsuccessful. I am sharing here my detailed steps and I would be grateful if you could guide me into how to troubleshoot this.
I want to calculate the PRS for breast cancer (PGS000004). I know the fact that, the scoring file for this particular PRS, does not include RSIDs but genomic positions (I am using this harmonized file for GRCh37). According to some of your answers in a forum, when using PGS Catalog files, we should add an additional column including all 1 or 0 as p-values. I modify this file to include a new column, (named p_value) that contains all 1 resulting in PGS000004_withpval.txt.
My input file is a a VCF obtained from imputation software, containing approx. 80M variants. The first I do is to normalize this VCF using bcftools, so that there is one single row per genomic position.
bcftools norm -m +any -O z -o NORMALIZED_VCF /home/user/data/ORIGINALFILEVCF_imputed.vcf.gz
Then, I transform this file into the necessary input files for PRSice (.bed, .bim, .fam) using PLINK v1.9.
plink --vcf /home/user/data/NORMALIZED_VCF.vcf.gz --snps-only --make-bed --out NORM_PLINK_VCF
Finally, I run PRSice, with the following parameters:
Rscript /home/user/data/PRSice.R --prsice /home/user/data/PRSice_linux --base /home/user/data/PGS000004_withpval.txt --a1 effect_allele --a2 other_allele --stat effect_weight --pvalue p_value --beta --bp chr_position --chr chr_name --chr-id c:l-ab --target NORM_PLINK_VCF --no-clump --out Output_NORM_PLINK_VCF_PRSice
The script runs, but I get the following error:So, I understand there is a problem with the phenotype file. The phenotype of this file, is unknown, that's why I want to calculate the PRS, but perhaps I am incorrectly adding some extra parameters that are not necessary. Would you mind guiding me to this calculation? Thank you very much!