choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
180 stars 85 forks source link

Trying to run PRSice with Age and Sex as covariates but I get the error "ERROR: All samples removed due to missingness in covariates file" #325

Closed varsh19 closed 1 year ago

varsh19 commented 1 year ago

Describe the bug I am running PRSice 2 with a phenotype file and a covariate file with age and sex as covariates. I have no FID and I am using --ignore-fid flag.

Error Log

PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-06-26 13:45:33 ./PRSice/PRSice_linux \ --a1 A1 \ --a2 A2 \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base PGS000785.txt \ --beta \ --binary-target T \ --bp BP \ --chr CHR \ --cov cases_controls_new/covariates_white.txt \ --ignore-fid \ --interval 5e-05 \ --lower 5e-08 \ --no-clump \ --num-auto 22 \ --out PGS000785_results \ --pheno /home/vsrinivasan75/cases_controls_new/pheno_white.txt \ --pvalue P \ --seed 464394053 \ --snp SNP \ --stat BETA \ --target /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr#_v3,/home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr1_v3.sample \ --thread 1 \ --type bgen \ --upper 0.5

Initializing Genotype file: /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr#_v3 (bgen) With external fam file: /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr1_v3.sample

Start processing PGS000785 ==================================================

Base file: PGS000785.txt Header of file is: SNP CHR BP A1 A2 BETA allelefrequency_effect OR P

Reading 100.00% 103 variant(s) observed in base file, with: 9 ambiguous variant(s) excluded 94 total variant(s) included from base file

Loading Genotype info from target ==================================================

487409 people (222965 male(s), 264262 female(s)) observed 487409 founder(s) included

7402K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr1_v3.bgen
8129K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr2_v3.bgen
6696K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr3_v3.bgen
6555K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr4_v3.bgen
6070K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr5_v3.bgen
5751K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr6_v3.bgen
5405K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr7_v3.bgen
5282K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr8_v3.bgen
4066K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr9_v3.bgen
4562K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr10_v3.bgen
4628K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr11_v3.bgen
4431K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr12_v3.bgen
3270K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr13_v3.bgen
3037K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr14_v3.bgen
2767K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr15_v3.bgen
3089K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr16_v3.bgen
2660K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr17_v3.bgen
2599K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr18_v3.bgen
2087K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr19_v3.bgen
2082K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr20_v3.bgen
1261K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr21_v3.bgen
1255K SNPs processed in /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr22_v3.bgen
93095529 variant(s) not found in previous data 94 variant(s) included

Phenotype file: /home/vsrinivasan75/cases_controls_new/pheno_white.txt Column Name of Sample ID: IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Processing the 1 th phenotype

PHENO is a binary phenotype 73947 sample(s) without phenotype 407733 control(s) 5729 case(s)

Processing the covariate file: cases_controls_new/covariates_white.txt ==============================

Error: All samples removed due to missingness in covariate file!

Error: Execution halted

To Reproduce This is the command I used: Rscript PRSice/PRSice.R --base PGS000785.txt --target /home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr#_v3,/home/sharedFolder/referenceData/ukb/imputed_genotypes/ukb_imp_chr1_v3.sample --type bgen --stat BETA --binary-target T --no-clump --ignore-fid --pheno /home/vsrinivasan75/cases_controls_new/pheno_white.txt --quantile 100 --quant-break 10,20,30,40,50,60,70,80,90,100 --out PGS000785_results --prsice PRSice/PRSice_linux --cov cases_controls_new/covariates_white.txt

Additional context I am performing the task with only cases of "white" ethnic background. So my phenotype contains information only for white people with colorectal cancer. I tried with 2 covariate files - One for white, and one for all individuals in the UK Biobank, but both yield the same error. I have run the tool without adding any covariates file and it works perfectly. Is the covariate file necessarily supposed to include PCs, because mine has just age and sex?

choishingwan commented 1 year ago

Check if sex is encoded as numeric variable or not. If not, then make sure that is included in --cov-factor to tell prsice that it is not numeric. Otherwise, prsice treat it as missing

Sam

On Mon, Jun 26, 2023, 2:28 PM Varsha Srinivasan @.***> wrote:

Assigned #325 https://github.com/choishingwan/PRSice/issues/325 to @choishingwan https://github.com/choishingwan.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/325#event-9642394735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRGKE6UNSSPOZQZNWLXNHID5ANCNFSM6AAAAAAZURE2SQ . You are receiving this because you were assigned.Message ID: @.***>

varsh19 commented 1 year ago

Thank you so much; it worked!