Issue with covariate-file

choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores

http://prsice.info

GNU General Public License v3.0

187 stars 90 forks source link

Issue with covariate-file #337

Open LarsOstman opened 1 year ago

LarsOstman commented 1 year ago

Hello, I am trying to calculate a PRS-score, with PRSice2, on a case-control-cohort based on summary statistics from a larger GWAS-study. I have calculated principal components and want to use the first 6 PCs as covariates for the analysis. However, when I run the analysis I get the following error message:

Error: All samples removed due to missingness in covariate file!

I have made sure there aren't any hidden spaces in the covariates-file, I have tried to delimit with both tabs and spaces, and I have checked (and re-checked) that the path and the file-name are correct. However the same error-message keeps showing up.

Any help would be greatly appreciated, I will paste in the whole process below.

Thanks for a great product, Lars

laros@maul:/fenix/users/laros/ALF/Genetics/scripts$ ./ALF_PRS_by_group.sh PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-08-17 13:54:23 /home/laros/PRSice2/PRSice_linux \ --a1 A1 \ --a2 A2 \ --bar-levels 1e-05,5e-05,0.0001,0.0005,0.001,0.005,0.01,0.05,1 \ --base /fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt \ --binary-target T \ --clump-kb 250kb \ --clump-p 1.000000 \ --clump-r2 0.100000 \ --cov /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs \ --ignore-fid \ --interval 5e-05 \ --keep-ambig \ --ld /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr \ --ld-keep /fenix/users/laros/Elefanten_gene/LD-data/1000genomes/1000Genomes_EURListPhase3.txt \ --lower 1e-11 \ --num-auto 22 \ --or \ --out /fenix/users/laros/Elefanten_gene/results/ALF_gene_by_group \ --pheno /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno \ --pheno-col MDD \ --pvalue P \ --score std \ --seed 3270214622 \ --snp MarkerName \ --stat LogOR \ --target /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC \ --thread 1 \ --upper 0.05

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC (bed)

Start processing PGC_UKB_depression_genome-wide ==================================================

Base file: /fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt Header of file is: MarkerName A1 A2 Freq LogOR StdErrLogOR P

Reading 100.00% 8483301 variant(s) observed in base file, with: 39487 NA stat/p-value observed 4210543 negative statistic observed. Maybe you have forgotten the --beta flag? 646120 ambiguous variant(s) 4233271 total variant(s) included from base file

Loading Genotype info from target ==================================================

92 people (0 male(s), 0 female(s)) observed 92 founder(s) included

4112097 variant(s) not found in previous data 43 variant(s) with mismatch information 522636 ambiguous variant(s) kept 3460831 variant(s) included

Initializing Genotype file: /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr (bed)

Loading Genotype info from reference ==================================================

2504 people (0 male(s), 0 female(s)) observed 503 founder(s) included

10540328 variant(s) not found in previous data 149 variant(s) with mismatch information 469778 ambiguous variant(s) kept 3104546 variant(s) included

Phenotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno Column Name of Sample ID: FID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 188356

Processing the 1 th phenotype

MDD is a binary phenotype 35 control(s) 57 case(s)

Processing the covariate file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs ==============================

Error: All samples removed due to missingness in covariate file!

choishingwan commented 1 year ago

What's the header of your pc file?

On Fri, Aug 18, 2023, 2:58 AM LarsOstman @.***> wrote:

Hello, I am trying to calculate a PRS-score, with PRSice2, on a case-control-cohort based on summary statistics from a larger GWAS-study. I have calculated principal components and want to use the first 6 PCs as covariates for the analysis. However, when I run the analysis I get the following error message:

Error: All samples removed due to missingness in covariate file!

I have made sure there aren't any hidden spaces in the covariates-file, I have tried to delimit with both tabs and spaces, and I have checked (and re-checked) that the path and the file-name are correct. However the same error-message keeps showing up.

Any help would be greatly appreciated, I will paste in the whole process below.

Thanks for a great product, Lars

@.***:/fenix/users/laros/ALF/Genetics/scripts$ ./ALF_PRS_by_group.sh PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-08-17 13:54:23 /home/laros/PRSice2/PRSice_linux --a1 A1 --a2 A2 --bar-levels 1e-05,5e-05,0.0001,0.0005,0.001,0.005,0.01,0.05,1 --base /fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

--binary-target T --clump-kb 250kb --clump-p 1.000000 --clump-r2 0.100000 --cov /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs --ignore-fid --interval 5e-05 --keep-ambig --ld /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr --ld-keep /fenix/users/laros/Elefanten_gene/LD-data/1000genomes/1000Genomes_EURListPhase3.txt

--lower 1e-11 --num-auto 22 --or --out /fenix/users/laros/Elefanten_gene/results/ALF_gene_by_group --pheno /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno --pheno-col MDD --pvalue P --score std --seed 3270214622 --snp MarkerName --stat LogOR --target /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC --thread 1 --upper 0.05

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC (bed) Start processing PGC_UKB_depression_genome-wide

Base file:

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt Header of file is: MarkerName A1 A2 Freq LogOR StdErrLogOR P

Reading 100.00% 8483301 variant(s) observed in base file, with: 39487 NA stat/p-value observed 4210543 negative statistic observed. Maybe you have forgotten the --beta flag? 646120 ambiguous variant(s) 4233271 total variant(s) included from base file Loading Genotype info from target

92 people (0 male(s), 0 female(s)) observed 92 founder(s) included

4112097 variant(s) not found in previous data 43 variant(s) with mismatch information 522636 ambiguous variant(s) kept 3460831 variant(s) included

Initializing Genotype file: /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr (bed) Loading Genotype info from reference

2504 people (0 male(s), 0 female(s)) observed 503 founder(s) included

10540328 variant(s) not found in previous data 149 variant(s) with mismatch information 469778 ambiguous variant(s) kept 3104546 variant(s) included

Phenotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno Column Name of Sample ID: FID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 188356

Processing the 1 th phenotype

MDD is a binary phenotype 35 control(s) 57 case(s) Processing the covariate file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs

Error: All samples removed due to missingness in covariate file!

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRD2WA3JMRMHKMFKATXV4HCTANCNFSM6AAAAAA3VC7XBA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

LarsOstman commented 1 year ago

Hi, Thank you for getting back to me!

The headers (and format) are as follows: FID IID PC1 PC2 PC3 PC4 PC5 PC6 F1 F1 -0.0488942 -0.00648387 0.0119713 0.0394345 -0.0165522 0.0617235 F2 F2 -0.0499371 0.0127898 0.0426918 0.0412524 -0.0538963 0.0342523 F4 F4 0.0154813 0.0156588 0.0044783 -0.00596863 -0.023635 0.00985086 F5 F5 -0.0147007 0.00670695 0.0355421 0.00302993 -0.0671668 -0.00930397 F6 F6 -0.0259049 -0.0069673 -0.0347271 -0.0398622 0.015978 0.0781486 F8 F8 -0.0345881 0.0205085 -0.0136661 0.0191272 -0.0209368 0.0631035 F9 F9 -0.0259158 0.0119127 0.0224861 0.0451637 -0.0516346 0.0112552

The columns are tab delimited in the file, but I’ve tried with space aswell and get the same error-message.

Thanks again, Lars

From: Shing Wan Choi @.> Sent: den 18 augusti 2023 14:04 To: choishingwan/PRSice @.> Cc: Lars Östman @.>; Author @.> Subject: Re: [choishingwan/PRSice] Issue with covariate-file (Issue #337)

What's the header of your pc file?

On Fri, Aug 18, 2023, 2:58 AM LarsOstman @.<mailto:@.>> wrote:

Hello, I am trying to calculate a PRS-score, with PRSice2, on a case-control-cohort based on summary statistics from a larger GWAS-study. I have calculated principal components and want to use the first 6 PCs as covariates for the analysis. However, when I run the analysis I get the following error message:

Error: All samples removed due to missingness in covariate file!

I have made sure there aren't any hidden spaces in the covariates-file, I have tried to delimit with both tabs and spaces, and I have checked (and re-checked) that the path and the file-name are correct. However the same error-message keeps showing up.

Any help would be greatly appreciated, I will paste in the whole process below.

Thanks for a great product, Lars

@.:/fenix/users/laros/ALF/Genetics/scripts$<mailto:@.:/fenix/users/laros/ALF/Genetics/scripts$> ./ALF_PRS_by_group.sh PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-08-17 13:54:23 /home/laros/PRSice2/PRSice_linux --a1 A1 --a2 A2 --bar-levels 1e-05,5e-05,0.0001,0.0005,0.001,0.005,0.01,0.05,1 --base /fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

--binary-target T --clump-kb 250kb --clump-p 1.000000 --clump-r2 0.100000 --cov /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs --ignore-fid --interval 5e-05 --keep-ambig --ld /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr --ld-keep /fenix/users/laros/Elefanten_gene/LD-data/1000genomes/1000Genomes_EURListPhase3.txt

--lower 1e-11 --num-auto 22 --or --out /fenix/users/laros/Elefanten_gene/results/ALF_gene_by_group --pheno /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno --pheno-col MDD --pvalue P --score std --seed 3270214622 --snp MarkerName --stat LogOR --target /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC --thread 1 --upper 0.05

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC (bed) Start processing PGC_UKB_depression_genome-wide

Base file:

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt Header of file is: MarkerName A1 A2 Freq LogOR StdErrLogOR P

Reading 100.00% 8483301 variant(s) observed in base file, with: 39487 NA stat/p-value observed 4210543 negative statistic observed. Maybe you have forgotten the --beta flag? 646120 ambiguous variant(s) 4233271 total variant(s) included from base file Loading Genotype info from target

92 people (0 male(s), 0 female(s)) observed 92 founder(s) included

4112097 variant(s) not found in previous data 43 variant(s) with mismatch information 522636 ambiguous variant(s) kept 3460831 variant(s) included

Initializing Genotype file: /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr (bed) Loading Genotype info from reference

2504 people (0 male(s), 0 female(s)) observed 503 founder(s) included

10540328 variant(s) not found in previous data 149 variant(s) with mismatch information 469778 ambiguous variant(s) kept 3104546 variant(s) included

Phenotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno Column Name of Sample ID: FID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 188356

Processing the 1 th phenotype

MDD is a binary phenotype 35 control(s) 57 case(s) Processing the covariate file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs

Error: All samples removed due to missingness in covariate file!

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRD2WA3JMRMHKMFKATXV4HCTANCNFSM6AAAAAA3VC7XBA . You are receiving this because you are subscribed to this thread.Message ID: @.<mailto:@.>>

— Reply to this email directly, view it on GitHubhttps://github.com/choishingwan/PRSice/issues/337#issuecomment-1683819700, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB7XS3I6HX4PPOVV7BPJMF3XV5K3HANCNFSM6AAAAAA3VC7XBA. You are receiving this because you authored the thread.Message ID: @.**@.>>

LarsOstman commented 1 year ago

Thought I'd add that it is just the .eigenvec output-file from the PC-analysis, which I haven't done any changes to.

Lars

Den 18 aug. 2023 14:04 skrev Shing Wan Choi @.***>:

What's the header of your pc file?

On Fri, Aug 18, 2023, 2:58 AM LarsOstman @.***> wrote:

Hello, I am trying to calculate a PRS-score, with PRSice2, on a case-control-cohort based on summary statistics from a larger GWAS-study. I have calculated principal components and want to use the first 6 PCs as covariates for the analysis. However, when I run the analysis I get the following error message:

Error: All samples removed due to missingness in covariate file!

I have made sure there aren't any hidden spaces in the covariates-file, I have tried to delimit with both tabs and spaces, and I have checked (and re-checked) that the path and the file-name are correct. However the same error-message keeps showing up.

Any help would be greatly appreciated, I will paste in the whole process below.

Thanks for a great product, Lars

@.***:/fenix/users/laros/ALF/Genetics/scripts$ ./ALF_PRS_by_group.sh PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-08-17 13:54:23 /home/laros/PRSice2/PRSice_linux --a1 A1 --a2 A2 --bar-levels 1e-05,5e-05,0.0001,0.0005,0.001,0.005,0.01,0.05,1 --base /fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

--binary-target T --clump-kb 250kb --clump-p 1.000000 --clump-r2 0.100000 --cov /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs --ignore-fid --interval 5e-05 --keep-ambig --ld /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr --ld-keep /fenix/users/laros/Elefanten_gene/LD-data/1000genomes/1000Genomes_EURListPhase3.txt

--lower 1e-11 --num-auto 22 --or --out /fenix/users/laros/Elefanten_gene/results/ALF_gene_by_group --pheno /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno --pheno-col MDD --pvalue P --score std --seed 3270214622 --snp MarkerName --stat LogOR --target /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC --thread 1 --upper 0.05

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC (bed) Start processing PGC_UKB_depression_genome-wide

Base file:

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt Header of file is: MarkerName A1 A2 Freq LogOR StdErrLogOR P

Reading 100.00% 8483301 variant(s) observed in base file, with: 39487 NA stat/p-value observed 4210543 negative statistic observed. Maybe you have forgotten the --beta flag? 646120 ambiguous variant(s) 4233271 total variant(s) included from base file Loading Genotype info from target

92 people (0 male(s), 0 female(s)) observed 92 founder(s) included

4112097 variant(s) not found in previous data 43 variant(s) with mismatch information 522636 ambiguous variant(s) kept 3460831 variant(s) included

Initializing Genotype file: /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr (bed) Loading Genotype info from reference

2504 people (0 male(s), 0 female(s)) observed 503 founder(s) included

10540328 variant(s) not found in previous data 149 variant(s) with mismatch information 469778 ambiguous variant(s) kept 3104546 variant(s) included

Phenotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno Column Name of Sample ID: FID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 188356

Processing the 1 th phenotype

MDD is a binary phenotype 35 control(s) 57 case(s) Processing the covariate file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs

Error: All samples removed due to missingness in covariate file!

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRD2WA3JMRMHKMFKATXV4HCTANCNFSM6AAAAAA3VC7XBA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

choishingwan commented 1 year ago

You used ignore fid, and you have the fid column in your covariate file. In addition, as you did not specify the covariates, PRSice will use all non-ID fields, in this case the IID (default is the first column is id). Easy fix will be --cov-col @PC[1-6]

Sam

On Fri, Aug 18, 2023, 9:32 AM LarsOstman @.***> wrote:

Thought I'd add that it is just the .eigenvec output-file from the PC-analysis, which I haven't done any changes to.

Lars

Den 18 aug. 2023 14:04 skrev Shing Wan Choi @.***>:

What's the header of your pc file?

On Fri, Aug 18, 2023, 2:58 AM LarsOstman @.***> wrote:

Hello, I am trying to calculate a PRS-score, with PRSice2, on a case-control-cohort based on summary statistics from a larger GWAS-study. I have calculated principal components and want to use the first 6 PCs as covariates for the analysis. However, when I run the analysis I get the following error message:

Error: All samples removed due to missingness in covariate file!

I have made sure there aren't any hidden spaces in the covariates-file, I have tried to delimit with both tabs and spaces, and I have checked (and re-checked) that the path and the file-name are correct. However the same error-message keeps showing up.

Any help would be greatly appreciated, I will paste in the whole process below.

Thanks for a great product, Lars

@.***:/fenix/users/laros/ALF/Genetics/scripts$ ./ALF_PRS_by_group.sh PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-08-17 13:54:23 /home/laros/PRSice2/PRSice_linux --a1 A1 --a2 A2 --bar-levels 1e-05,5e-05,0.0001,0.0005,0.001,0.005,0.01,0.05,1 --base

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

--binary-target T --clump-kb 250kb --clump-p 1.000000 --clump-r2 0.100000 --cov /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs --ignore-fid --interval 5e-05 --keep-ambig --ld /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr --ld-keep

/fenix/users/laros/Elefanten_gene/LD-data/1000genomes/1000Genomes_EURListPhase3.txt

--lower 1e-11 --num-auto 22 --or --out /fenix/users/laros/Elefanten_gene/results/ALF_gene_by_group --pheno /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno --pheno-col MDD --pvalue P --score std --seed 3270214622 --snp MarkerName --stat LogOR --target /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC --thread 1 --upper 0.05

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC (bed) Start processing PGC_UKB_depression_genome-wide

Base file:

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

Header of file is: MarkerName A1 A2 Freq LogOR StdErrLogOR P

Reading 100.00% 8483301 variant(s) observed in base file, with: 39487 NA stat/p-value observed 4210543 negative statistic observed. Maybe you have forgotten the --beta flag? 646120 ambiguous variant(s) 4233271 total variant(s) included from base file Loading Genotype info from target

92 people (0 male(s), 0 female(s)) observed 92 founder(s) included

4112097 variant(s) not found in previous data 43 variant(s) with mismatch information 522636 ambiguous variant(s) kept 3460831 variant(s) included

Initializing Genotype file: /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr (bed) Loading Genotype info from reference

2504 people (0 male(s), 0 female(s)) observed 503 founder(s) included

10540328 variant(s) not found in previous data 149 variant(s) with mismatch information 469778 ambiguous variant(s) kept 3104546 variant(s) included

Phenotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno Column Name of Sample ID: FID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 188356

Processing the 1 th phenotype

MDD is a binary phenotype 35 control(s) 57 case(s) Processing the covariate file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs

Error: All samples removed due to missingness in covariate file!

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAJTRYRD2WA3JMRMHKMFKATXV4HCTANCNFSM6AAAAAA3VC7XBA>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub< https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683819700>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/BB7XS3I6HX4PPOVV7BPJMF3XV5K3HANCNFSM6AAAAAA3VC7XBA>.

You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683927684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYSJDJRCML7MTMQF4GLXV5VGPANCNFSM6AAAAAA3VC7XBA . You are receiving this because you commented.Message ID: @.***>

LarsOstman commented 1 year ago

Thank you so much, and I apologize for taking your time with such a simple answer. I'll fix it straight away. And just to see if I understand, would another solution be to remove the FID-column from the covariates-file? Since they would make IID the first column, and thus the default one?

Thank you once again!

Lars

Den 18 aug. 2023 15:46 skrev Shing Wan Choi @.***>:

Sam

On Fri, Aug 18, 2023, 9:32 AM LarsOstman @.***> wrote:

Thought I'd add that it is just the .eigenvec output-file from the PC-analysis, which I haven't done any changes to.

Lars

Den 18 aug. 2023 14:04 skrev Shing Wan Choi @.***>:

What's the header of your pc file?

On Fri, Aug 18, 2023, 2:58 AM LarsOstman @.***> wrote:

Hello, I am trying to calculate a PRS-score, with PRSice2, on a case-control-cohort based on summary statistics from a larger GWAS-study. I have calculated principal components and want to use the first 6 PCs as covariates for the analysis. However, when I run the analysis I get the following error message:

Error: All samples removed due to missingness in covariate file!

I have made sure there aren't any hidden spaces in the covariates-file, I have tried to delimit with both tabs and spaces, and I have checked (and re-checked) that the path and the file-name are correct. However the same error-message keeps showing up.

Any help would be greatly appreciated, I will paste in the whole process below.

Thanks for a great product, Lars

@.***:/fenix/users/laros/ALF/Genetics/scripts$ ./ALF_PRS_by_group.sh PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-08-17 13:54:23 /home/laros/PRSice2/PRSice_linux --a1 A1 --a2 A2 --bar-levels 1e-05,5e-05,0.0001,0.0005,0.001,0.005,0.01,0.05,1 --base

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

--binary-target T --clump-kb 250kb --clump-p 1.000000 --clump-r2 0.100000 --cov /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs --ignore-fid --interval 5e-05 --keep-ambig --ld /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr --ld-keep

/fenix/users/laros/Elefanten_gene/LD-data/1000genomes/1000Genomes_EURListPhase3.txt

--lower 1e-11 --num-auto 22 --or --out /fenix/users/laros/Elefanten_gene/results/ALF_gene_by_group --pheno /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno --pheno-col MDD --pvalue P --score std --seed 3270214622 --snp MarkerName --stat LogOR --target /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC --thread 1 --upper 0.05

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC (bed) Start processing PGC_UKB_depression_genome-wide

Base file:

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

Header of file is: MarkerName A1 A2 Freq LogOR StdErrLogOR P

Reading 100.00% 8483301 variant(s) observed in base file, with: 39487 NA stat/p-value observed 4210543 negative statistic observed. Maybe you have forgotten the --beta flag? 646120 ambiguous variant(s) 4233271 total variant(s) included from base file Loading Genotype info from target

92 people (0 male(s), 0 female(s)) observed 92 founder(s) included

4112097 variant(s) not found in previous data 43 variant(s) with mismatch information 522636 ambiguous variant(s) kept 3460831 variant(s) included

Initializing Genotype file: /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr (bed) Loading Genotype info from reference

2504 people (0 male(s), 0 female(s)) observed 503 founder(s) included

10540328 variant(s) not found in previous data 149 variant(s) with mismatch information 469778 ambiguous variant(s) kept 3104546 variant(s) included

Phenotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno Column Name of Sample ID: FID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 188356

Processing the 1 th phenotype

MDD is a binary phenotype 35 control(s) 57 case(s) Processing the covariate file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs

Error: All samples removed due to missingness in covariate file!

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAJTRYRD2WA3JMRMHKMFKATXV4HCTANCNFSM6AAAAAA3VC7XBA>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub< https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683819700>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/BB7XS3I6HX4PPOVV7BPJMF3XV5K3HANCNFSM6AAAAAA3VC7XBA>.

You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683927684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYSJDJRCML7MTMQF4GLXV5VGPANCNFSM6AAAAAA3VC7XBA . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/choishingwan/PRSice/issues/337#issuecomment-1683946186, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB7XS3LBUDC3OLETF3DF5SLXV5W3VANCNFSM6AAAAAA3VC7XBA. You are receiving this because you authored the thread.Message ID: @.***>

choishingwan commented 1 year ago

Yes

On Fri, Aug 18, 2023, 10:06 AM LarsOstman @.***> wrote:

Thank you so much, and I apologize for taking your time with such a simple answer. I'll fix it straight away. And just to see if I understand, would another solution be to remove the FID-column from the covariates-file? Since they would make IID the first column, and thus the default one?

Thank you once again!

Lars

Den 18 aug. 2023 15:46 skrev Shing Wan Choi @.***>:

You used ignore fid, and you have the fid column in your covariate file. In addition, as you did not specify the covariates, PRSice will use all non-ID fields, in this case the IID (default is the first column is id). Easy fix will be --cov-col @PC[1-6]

Sam

On Fri, Aug 18, 2023, 9:32 AM LarsOstman @.***> wrote:

Thought I'd add that it is just the .eigenvec output-file from the PC-analysis, which I haven't done any changes to.

Lars

Den 18 aug. 2023 14:04 skrev Shing Wan Choi @.***>:

What's the header of your pc file?

On Fri, Aug 18, 2023, 2:58 AM LarsOstman @.***> wrote:

Hello, I am trying to calculate a PRS-score, with PRSice2, on a case-control-cohort based on summary statistics from a larger GWAS-study. I have calculated principal components and want to use the first 6 PCs as covariates for the analysis. However, when I run the analysis I get the following error message:

Error: All samples removed due to missingness in covariate file!

I have made sure there aren't any hidden spaces in the covariates-file, I have tried to delimit with both tabs and spaces, and I have checked (and re-checked) that the path and the file-name are correct. However the same error-message keeps showing up.

Any help would be greatly appreciated, I will paste in the whole process below.

Thanks for a great product, Lars

@.***:/fenix/users/laros/ALF/Genetics/scripts$ ./ALF_PRS_by_group.sh PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-08-17 13:54:23 /home/laros/PRSice2/PRSice_linux --a1 A1 --a2 A2 --bar-levels 1e-05,5e-05,0.0001,0.0005,0.001,0.005,0.01,0.05,1 --base

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

--binary-target T --clump-kb 250kb --clump-p 1.000000 --clump-r2 0.100000 --cov /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs --ignore-fid --interval 5e-05 --keep-ambig --ld /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr --ld-keep

/fenix/users/laros/Elefanten_gene/LD-data/1000genomes/1000Genomes_EURListPhase3.txt

--lower 1e-11 --num-auto 22 --or --out /fenix/users/laros/Elefanten_gene/results/ALF_gene_by_group --pheno /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno --pheno-col MDD --pvalue P --score std --seed 3270214622 --snp MarkerName --stat LogOR --target /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC --thread 1 --upper 0.05

Warning: By selecting --keep-ambig, PRSice assume the base and target are reporting alleles on the same strand and will therefore only perform dosage flip for the ambiguous SNPs. If you are unsure of what the strand is, then you should not select the --keep-ambig option

Initializing Genotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.QC (bed) Start processing PGC_UKB_depression_genome-wide

Base file:

/fenix/users/laros/Elefanten_gene/summary_stat/PGC_UKB_depression_genome-wide.txt

Header of file is: MarkerName A1 A2 Freq LogOR StdErrLogOR P

Reading 100.00% 8483301 variant(s) observed in base file, with: 39487 NA stat/p-value observed 4210543 negative statistic observed. Maybe you have forgotten the --beta flag? 646120 ambiguous variant(s) 4233271 total variant(s) included from base file Loading Genotype info from target

92 people (0 male(s), 0 female(s)) observed 92 founder(s) included

4112097 variant(s) not found in previous data 43 variant(s) with mismatch information 522636 ambiguous variant(s) kept 3460831 variant(s) included

Initializing Genotype file: /fenix/users/laros/Elefanten_gene/LD-data/1kg_phase3.AllChr (bed) Loading Genotype info from reference

2504 people (0 male(s), 0 female(s)) observed 503 founder(s) included

10540328 variant(s) not found in previous data 149 variant(s) with mismatch information 469778 ambiguous variant(s) kept 3104546 variant(s) included

Phenotype file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.pheno Column Name of Sample ID: FID Note: If the phenotype file does not contain a header, the column name will be displayed as the Sample ID which is expected.

There are a total of 1 phenotype to process

Start performing clumping

Clumping Progress: 100.00% Number of variant(s) after clumping : 188356

Processing the 1 th phenotype

MDD is a binary phenotype 35 control(s) 57 case(s) Processing the covariate file: /fenix/users/laros/ALF/Genetics/data/ALF_gene.PCs

Error: All samples removed due to missingness in covariate file!

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAJTRYRD2WA3JMRMHKMFKATXV4HCTANCNFSM6AAAAAA3VC7XBA>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub<

https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683819700>,

or unsubscribe<

https://github.com/notifications/unsubscribe-auth/BB7XS3I6HX4PPOVV7BPJMF3XV5K3HANCNFSM6AAAAAA3VC7XBA>.

You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683927684>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAJTRYSJDJRCML7MTMQF4GLXV5VGPANCNFSM6AAAAAA3VC7XBA>

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub< https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683946186>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/BB7XS3LBUDC3OLETF3DF5SLXV5W3VANCNFSM6AAAAAA3VC7XBA>.

You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/337#issuecomment-1683973674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYTCYVWWHBTHA2EQXP3XV5ZGFANCNFSM6AAAAAA3VC7XBA . You are receiving this because you commented.Message ID: @.***>