[BUG] No phenotype presented - causing issues with or without clumping

DiegoMac17 commented 2 years ago

Describe the bug When running PRSice after loading the files it prints out the following error for what it seems to be problem when reading the phenotype file. However, based on the error displayed I am not able to figure out what might be causing execution to halt.

The pheno_out.txt is a tab separated file with headers (FID, IID, PHENO) and the data looks as follows: FID IID PHENO 1 1 1.583902 2 2 -0.218651 3 3 1.285503 4 4 2.017829 5 5 0.695918 6 6 1.329615 7 7 1.402525 8 8 0.656032 9 9 2.019129 10 10 1.983777

The FID and IID match the format on the base and target files (1,2,3,4,5....)

Error Log PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2022-06-29 08:04:38 /home/reyes/PRSice/PRSice_linux \ --a1 A1 \ --a2 A2 \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base GLM_exp.txt \ --beta \ --binary-target F \ --bp BP \ --chr CHR \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --interval 5e-05 \ --keep-ambig \ --lower 5e-08 \ --num-auto 22 \ --out PRS_prsice \ --pheno pheno_out.txt \ --pvalue P \ --seed 3306800842 \ --snp SNP \ --stat BETA \ --target simfile_BN_50_20_30_continuous_PRS_train \ --thread 1 \ --upper 0.5

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: simfile_BN_50_20_30_continuous_PRS_train (bed)

Start processing GLM_exp %==================================================

Base file: GLM_exp.txt Header of file is: CHR BP SNP A2 ALT A1 TEST OBS_CT BETA SE T_STAT P

10000 variant(s) observed in base file, with: 10000 ambiguous variant(s) 10000 total variant(s) included from base file

Loading Genotype info from target %==================================================

1000 people (0 male(s), 0 female(s)) observed 0 founder(s) included

10000 ambiguous variant(s) kept 10000 variant(s) included

Phenotype file: data/pheno_out.txt Column Name of Sample ID: FID+IID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Processing the 1 th phenotype

No phenotype presented

Error: Execution halted

To Reproduce If possible, provide a minimum working example for me to reproduce the problem. This can usually speed up the debugging process.

Additional context Could this be some type of formatting issue ?

choishingwan commented 2 years ago

Can you check if the FID and IID in the fam file matches those from your pheno file?

On Wed, Jun 29, 2022 at 3:10 PM Diego Machado @.***> wrote:

Assigned #297 https://github.com/choishingwan/PRSice/issues/297 to @choishingwan https://github.com/choishingwan.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/297#event-6905125770, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRWRGQHO2CJGHHA2KDVRSNRPANCNFSM52G2CZRA . You are receiving this because you were assigned.Message ID: @.***>

DiegoMac17 commented 2 years ago

It does match; however it doesn't have headers, does that matter?

This is how the fam file looks like: (the phenotypes don't match because this is from a new simulated dataset I ran after posting the question above, although I am not sure if PRSice checks for this as well)

1 1 1 1 0 -0.707890 2 2 2 2 0 -1.232549 3 3 3 3 0 1.657482 4 4 4 4 0 -0.631688 5 5 5 5 0 2.371030 6 6 6 6 0 2.482346 7 7 7 7 0 0.651285 8 8 8 8 0 -0.372685 9 9 9 9 0 -0.997655 10 10 10 10 0 0.235402 11 11 11 11 0 1.781196 12 12 12 12 0 1.919559 13 13 13 13 0 2.219956 14 14 14 14 0 0.604353 15 15 15 15 0 1.102556 16 16 16 16 0 -0.846755 17 17 17 17 0 0.079595 18 18 18 18 0 0.164416 19 19 19 19 0 0.286265 20 20 20 20 0 -0.063699 21 21 21 21 0 0.484974 22 22 22 22 0 1.609720 23 23 23 23 0 1.780432 24 24 24 24 0 0.189820 25 25 25 25 0 0.047582 26 26 26 26 0 1.767434 27 27 27 27 0 1.835832 28 28 28 28 0 2.621844 29 29 29 29 0 0.885538 30 30 30 30 0 0.537878 31 31 31 31 0 1.445604 32 32 32 32 0 1.380921 33 33 33 33 0 2.402226 34 34 34 34 0 1.754204

DiegoMac17 commented 2 years ago

I just ran a new example and this is how the files look like for each case:

Phenotype file FID IID PHENO 1 1 1.895 2 2 0.531202 3 3 1.269095 4 4 1.15944 5 5 1.898842

.fam file of target 1 1 1 1 0 1.895000 2 2 2 2 0 0.531202 3 3 3 3 0 1.269095 4 4 4 4 0 1.159440 5 5 5 5 0 1.898842

choishingwan commented 2 years ago

Your fam file is ill formed. The third and fourth row should be the id of the father and mother specifically. Problem here is that you stated each individual is their own parent, thus prsice state that all samples are nonfounders, thus excluding them from the analysis

Sam

On Wed, Jun 29, 2022, 8:19 PM Diego Machado @.***> wrote:

I just ran a new example and this is how the files look like for each case:

Phenotype file FID IID PHENO 1 1 1.895 2 2 0.531202 3 3 1.269095 4 4 1.15944 5 5 1.898842

.fam file of target 1 1 1 1 0 1.895000 2 2 2 2 0 0.531202 3 3 3 3 0 1.269095 4 4 4 4 0 1.159440 5 5 5 5 0 1.898842

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/297#issuecomment-1170616012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYS6VL32DULPJU5NOMTVRTRYXANCNFSM52G2CZRA . You are receiving this because you were mentioned.Message ID: @.***>

DiegoMac17 commented 2 years ago

We are currently using simulated data so I can easily modify those numbers to be different, just checking would this cause any trouble for PRSice? and second question, is there a flag I can use not to take them into account, or can I just drop those columns from the fam file?

Thank you!

choishingwan commented 2 years ago

Just set those column to 0 as per the standard fam file format

On Wed, Jun 29, 2022, 8:55 PM Diego Machado @.***> wrote:

We are currently using simulated data so I can easily modify those numbers to be different, just checking would this cause any trouble for PRSice? and second question, is there a flag I can use not to take them into account, or can I just drop those columns from the fam file?

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/297#issuecomment-1170635829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYSK5BIE6NBDVN5ABHLVRTWBZANCNFSM52G2CZRA . You are receiving this because you were mentioned.Message ID: @.***>

DiegoMac17 commented 2 years ago

Excellent, thank you Sam!

choishingwan / PRSice

[BUG] No phenotype presented - causing issues with or without clumping #297