choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
185 stars 87 forks source link

ERROR: Sample mismatch between bgen and phenotype file #286

Closed syuxuan closed 2 years ago

syuxuan commented 2 years ago

Describe the bug This might be the same issue as isabelleazimm countered previously and I just want to check if there are possible ways to solve it. Error Log PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2022-01-27 15:58:19 ./PRSice_linux \ --a1 REF \ --a2 ALT \ --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \ --base a2.b37.txt \ --beta \ --binary-target T \ --bp POS \ --chr CHR \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --cov /home/songjie/work/Yuxuan/covariates.txt \ --dose-thres 0.000000 \ --extract PRSice.valid \ --hard \ --hard-thres 0.100000 \ --ignore-fid \ --interval 5e-05 \ --lower 5e-08 \ --num-auto 22 \ --out PRSice \ --pvalue all_inv_var_meta_p \ --score sum \ --seed 2089323591 \ --snp rsid \ --stat all_inv_var_meta_beta \ --target a2_extraction_chr1,a2_extraction_chr1.sample \ --thread 1 \ --type bgen \ --upper 0.5

Initializing Genotype file: a2_extraction_chr1 (bgen) With external fam file: a2_extraction_chr1.sample

Start processing a2.b37 ==================================================

SNP extraction/exclusion list contains 5 columns, will assume first column contains the SNP ID

Base file: a2.b37.txt Header of file is:

CHR POS REF ALT SNP all_meta_N all_inv_var_meta_beta all_inv_var_meta_sebeta all_inv_var_meta_p all_inv_var_het_p all_meta_sample_N all_meta_AF rsid

9856861 variant(s) observed in base file, with: 2944617 variant(s) excluded based on user input 6912244 total variant(s) included from base file

Loading Genotype info from target ==================================================

487409 people (222994 male(s), 264302 female(s)) observed 487409 founder(s) included

Error: Sample mismatch between bgen and phenotype file! Name in BGEN file is :1238122_1238122 and in phentoype file is: 1238122. Please note that PRSice require the bgen file and the .sample (or phenotype file if sample file is not provided) to have sample in the same order. (We might be able to losen this requirement in future when we have more time)

choishingwan commented 2 years ago

Your bgen file has sample named X_X yet your phenotype file only has X. Change your phenotype file name to X_X or just add an extra column of IID in front of the file will work

On Thu, Jan 27, 2022 at 3:23 AM syuxuan @.***> wrote:

Assigned #286 https://github.com/choishingwan/PRSice/issues/286 to @choishingwan https://github.com/choishingwan.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/286#event-5961882551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYTULFRNX4S4DV7PMULUYD6GNANCNFSM5M5GN7LA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

syuxuan commented 2 years ago

Hi, I have changed the phenotype file name to X_X, but a new problem happened: Error: Number of sample in phenotype file does not match number of samples specified in bgen file. Please check you have the correct phenotype file input. Note: Phenotype file should have the same number of samples as the bgen file and they should appear in the same order And it‘s impossible that the bgen file and sample file had mismatched sample names as they are directly extracted from UKB data.

syuxuan commented 2 years ago

Hi, sorry to bother u but I have fixed the problem. It turns out that when I was modifying the columns, I accidentally added a comma to one of the column names. However, I am still confused about how the sample name mismatch happens.

choishingwan commented 2 years ago

Honestly, without the data, it is rather difficult to tell. Best way to avoid that is to provide the sample file as an external fam and also provide a phenotype file.

On Thu, 27 Jan 2022 at 10:21 PM, syuxuan @.***> wrote:

Hi, sorry to bother u but I have fixed the problem. It turns out that when I was modifying the columns, I accidentally add a comma to one of the column names. However, I am still confused about how the sample name mismatch happens.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/286#issuecomment-1023849781, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYUKL6EUZQWIKLBB3FLUYIDU7ANCNFSM5M5GN7LA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- Dr Shing Wan Choi Instructor Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

syuxuan commented 2 years ago

Hi, there is another problem. When doing clumping, it gives me the following: Clumping Progress: 0.01%Killed Error: Execution halted

I have set 20 threads and 20 G memory to be used.

choishingwan commented 2 years ago

If you are doing bgen, 20G is definitely not enough. Depending on your version, try to not use multi-threading (as there's a memory bug with multi-thread) and then given PRSice more than 100G of memory. Alternatively, you can use plink to do the clumping

Sam

On Fri, Jan 28, 2022 at 8:15 PM syuxuan @.***> wrote:

Hi, there is another problem. When doing clumping, it gives me the following: Clumping Progress: 0.01%Killed Error: Execution halted

I have set 20 threads and 20 G memory to be used.

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/286#issuecomment-1024796225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYR55ZWJGJ64PAGEWQLUYM5UDANCNFSM5M5GN7LA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.