PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
106 stars 19 forks source link

The variants are undermatched. #276

Closed Linlin1213a closed 1 month ago

Linlin1213a commented 3 months ago

I used the following code to calculate PGS for my UK biobank research by PGS003765 research results, and successfully obtained the result, but my variants matched only 12.9%. Is this normal? Previously, I researched this issue and found that low matching could be due to the wrong genome build or a lack of imputation. For the first reason, I checked the official website of UK biobank and found that GRCh37 was indeed used for genome build. As for the second question, I am not sure whether UK biobank data has completed imputation. I really do not know about this aspect. Could you assist in identifying the cause of the low variant matching percentage? Thank you!

(base) ubuntu@VM-16-6-ubuntu:~$ nextflow run pgscatalog/pgsc_calc \
    -profile singularity \
    --input samplesheet.csv --target_build GRCh37 \
    --pgs_id PGS003765 \
    --run_ancestry /home/ubuntu/pgsc_HGDP+1kGP_v1.tar.zst
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/pgscatalog/pgsc_calc` [cheeky_bell] DSL2 - revision: 8bdf287d55 [main]
WARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`

------------------------------------------------------
  pgscatalog/pgsc_calc v2.0.0-alpha.5-g8bdf287
------------------------------------------------------
Core Nextflow options
  revision          : main
  runName           : cheeky_bell
  containerEngine   : singularity
  launchDir         : /home/ubuntu
  workDir           : /home/ubuntu/work
  projectDir        : /home/ubuntu/.nextflow/assets/pgscatalog/pgsc_calc
  userName          : ubuntu
  profile           : singularity
  configFiles       : 

Input/output options
  input             : samplesheet.csv
  pgs_id            : PGS003765
  outdir            : results

Reference options
  run_ancestry      : /home/ubuntu/pgsc_HGDP+1kGP_v1.tar.zst
  ref_samplesheet   : /home/ubuntu/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/reference.csv
  ld_grch37         : /home/ubuntu/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg19-GRCh37.txt
  ld_grch38         : /home/ubuntu/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg38-GRCh38.txt
  ancestry_checksums: /home/ubuntu/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/checksums.txt

Compatibility options
  target_build      : GRCh37

Matching options
  min_overlap       : 0

executor >  local (56)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (56)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (56)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (56)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (56)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (57)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (58)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (58)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
executor >  local (58)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF                                                 -
executor >  local (58)
[9c/70786c] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS003765, pgp_id:, trait_efo:])              [100%] 1 of 1 ✔
[d9/08eea4] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                         [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                          -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (ukb chromosome 9)                      [100%] 24 of 24, stored: 24 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF                                                 -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:EXTRACT_DATABASE (1)                                      [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (ukb chromosome 1)                     [100%] 24 of 24, stored: 24 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:FILTER_VARIANTS (ukb GRCh37)                              [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:PLINK2_MAKEBED_REF (reference)                            [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_THINNED (ukb)                                   [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:RELABEL_IDS (ukb null pvar)                               [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:PLINK2_MAKEBED_TARGET (ukb)                               [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:PLINK2_ORIENT (ukb)                                       [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:FRAPOSA_PCA (reference)                                   [100%] 1 of 1, stored: 1 ✔
[5b/4a8e3a] process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:FRAPOSA_PROJECT (ukb)                                     [100%] 10 of 10, stored: 6 ✔
[6f/a71228] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (ukb chromosome 8)                                    [100%] 24 of 24 ✔
[5d/219be1] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (ukb)                                                  [100%] 1 of 1 ✔
[a3/15c1b7] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:RELABEL_SCOREFILES (ukb additive scorefile)                    [100%] 1 of 1 ✔
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:RELABEL_AFREQ (ukb null afreq)                                 [100%] 1 of 1, stored: 1 ✔
[47/e17321] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE (reference chromosome ALL effect type additive 0) [100%] 22 of 22 ✔
[4c/74ab17] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (ukb)                                          [100%] 1 of 1 ✔
[11/afd94a] process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:ANCESTRY_ANALYSIS (1)                                               [100%] 1 of 1 ✔
[d8/1a6e55] process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (ukb)                                                  [100%] 1 of 1 ✔
[a0/421899] process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS (1)                                                   [100%] 1 of 1 ✔
-[pgscatalog/pgsc_calc] Pipeline completed successfully-
Completed at: 13-Apr-2024 23:49:20
Duration    : 1h 16m 4s
CPU hours   : 5.0
Succeeded   : 58

d2e7b8e70ca6dbde18c6137dcf35656 b5c9c788f828d061d2fef7e8c2eec32

smlmbrt commented 3 months ago

@Linlin1213a how many variants are in you dataset? If it's >10,000,000 it's likely imputed. See UKB docs: https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=100319.