PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
107 stars 20 forks source link

Some PGS Catalog scores not working with pgsc_calc #370

Open Sabramow opened 1 week ago

Sabramow commented 1 week ago

Description of the bug

I've run into issues calculating some of the scores from the PGS Catalog when I indicate their IDs with the --pgs_id parameter. Specifically: 1) PGS004255, PGS004256,PGS004258,PGS004259, PGS004260-64, PGS004272, PGS004273, PGS004280, PGS004299, PGS004301, PGS004304, PGS00428 (all from the same publication) cause an error. They are all formatted with dosage weights (see 'relevant files' section for example of formatting). 2) specifying PGS002759 ran without complication but output the score for PGS000767 (also for depression) rather than PGS002759.

Command used and terminal output

Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)'

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)` terminated with an error exit status (1)

Command executed:

  pgscatalog-combine -s PGS004174_hmPOS_GRCh38.txt.gz PGS004175_hmPOS_GRCh38.txt.gz PGS004176_hmPOS_GRCh38.txt.gz PGS004177_hmPOS_GRCh38.txt.gz PGS004178_hmPOS_GRCh38.txt.gz PGS004179_hmPOS_GRCh38.txt.gz PGS004180_hmPOS_GRCh38.txt.gz PGS004181_hmPOS_GRCh38.txt.gz PGS004182_hmPOS_GRCh38.txt.gz PGS004183_hmPOS_GRCh38.txt.gz PGS004184_hmPOS_GRCh38.txt.gz PGS004185_hmPOS_GRCh38.txt.gz PGS004186_hmPOS_GRCh38.txt.gz PGS004187_hmPOS_GRCh38.txt.gz PGS004188_hmPOS_GRCh38.txt.gz PGS004189_hmPOS_GRCh38.txt.gz PGS004190_hmPOS_GRCh38.txt.gz PGS004191_hmPOS_GRCh38.txt.gz PGS004192_hmPOS_GRCh38.txt.gz PGS004193_hmPOS_GRCh38.txt.gz PGS004194_hmPOS_GRCh38.txt.gz PGS004195_hmPOS_GRCh38.txt.gz PGS004196_hmPOS_GRCh38.txt.gz PGS004197_hmPOS_GRCh38.txt.gz PGS004198_hmPOS_GRCh38.txt.gz PGS004199_hmPOS_GRCh38.txt.gz PGS004200_hmPOS_GRCh38.txt.gz PGS004201_hmPOS_GRCh38.txt.gz PGS004202_hmPOS_GRCh38.txt.gz PGS004203_hmPOS_GRCh38.txt.gz PGS004204_hmPOS_GRCh38.txt.gz PGS004205_hmPOS_GRCh38.txt.gz PGS004206_hmPOS_GRCh38.txt.gz PGS004207_hmPOS_GRCh38.txt.gz PGS004208_hmPOS_GRCh38.txt.gz PGS004209_hmPOS_GRCh38.txt.gz PGS004210_hmPOS_GRCh38.txt.gz PGS004211_hmPOS_GRCh38.txt.gz PGS004212_hmPOS_GRCh38.txt.gz PGS004213_hmPOS_GRCh38.txt.gz PGS004214_hmPOS_GRCh38.txt.gz PGS004215_hmPOS_GRCh38.txt.gz PGS004216_hmPOS_GRCh38.txt.gz PGS004217_hmPOS_GRCh38.txt.gz PGS004218_hmPOS_GRCh38.txt.gz PGS004219_hmPOS_GRCh38.txt.gz PGS004220_hmPOS_GRCh38.txt.gz PGS004221_hmPOS_GRCh38.txt.gz PGS004222_hmPOS_GRCh38.txt.gz PGS004223_hmPOS_GRCh38.txt.gz PGS004224_hmPOS_GRCh38.txt.gz PGS004225_hmPOS_GRCh38.txt.gz PGS004226_hmPOS_GRCh38.txt.gz PGS004227_hmPOS_GRCh38.txt.gz PGS004228_hmPOS_GRCh38.txt.gz PGS004229_hmPOS_GRCh38.txt.gz PGS004230_hmPOS_GRCh38.txt.gz PGS004231_hmPOS_GRCh38.txt.gz PGS004232_hmPOS_GRCh38.txt.gz PGS004233_hmPOS_GRCh38.txt.gz PGS004234_hmPOS_GRCh38.txt.gz PGS004235_hmPOS_GRCh38.txt.gz PGS004236_hmPOS_GRCh38.txt.gz PGS004237_hmPOS_GRCh38.txt.gz PGS004238_hmPOS_GRCh38.txt.gz PGS004239_hmPOS_GRCh38.txt.gz PGS004240_hmPOS_GRCh38.txt.gz PGS004241_hmPOS_GRCh38.txt.gz PGS004242_hmPOS_GRCh38.txt.gz PGS004243_hmPOS_GRCh38.txt.gz PGS004244_hmPOS_GRCh38.txt.gz PGS004245_hmPOS_GRCh38.txt.gz PGS004246_hmPOS_GRCh38.txt.gz PGS004247_hmPOS_GRCh38.txt.gz PGS004248_hmPOS_GRCh38.txt.gz PGS004249_hmPOS_GRCh38.txt.gz PGS004250_hmPOS_GRCh38.txt.gz PGS004251_hmPOS_GRCh38.txt.gz PGS004252_hmPOS_GRCh38.txt.gz PGS004253_hmPOS_GRCh38.txt.gz PGS004254_hmPOS_GRCh38.txt.gz PGS004256_hmPOS_GRCh38.txt.gz PGS004257_hmPOS_GRCh38.txt.gz PGS004258_hmPOS_GRCh38.txt.gz PGS004259_hmPOS_GRCh38.txt.gz PGS004260_hmPOS_GRCh38.txt.gz PGS004261_hmPOS_GRCh38.txt.gz PGS004262_hmPOS_GRCh38.txt.gz PGS004263_hmPOS_GRCh38.txt.gz PGS004264_hmPOS_GRCh38.txt.gz PGS004265_hmPOS_GRCh38.txt.gz PGS004266_hmPOS_GRCh38.txt.gz PGS004267_hmPOS_GRCh38.txt.gz PGS004268_hmPOS_GRCh38.txt.gz PGS004269_hmPOS_GRCh38.txt.gz PGS004270_hmPOS_GRCh38.txt.gz PGS004271_hmPOS_GRCh38.txt.gz PGS004272_hmPOS_GRCh38.txt.gz PGS004273_hmPOS_GRCh38.txt.gz             -t GRCh38             -o scorefiles.txt.gz             -l log_scorefiles.json             -v             -v

  cat <<-END_VERSIONS > versions.yml
  COMBINE_SCOREFILES:
      pgscatalog.core: $(echo $(python -c 'import pgscatalog.core; print(pgscatalog.core.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004245
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004246
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004247

   75%|███████▍  | 74/99 [01:43<00:31,  1.24s/it]pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004248
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004249
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004250
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004251
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004252
  pgscatalog.core.lib._normalise: 2024-08-26 16:12:07 WARNING  Multiple other_alleles detected in 43 variants
  pgscatalog.core.lib._normalise: 2024-08-26 16:12:07 WARNING  Other allele for these variants is set to missing

   80%|███████▉  | 79/99 [01:43<00:14,  1.37it/s]pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:07 INFO     Processing PGS004253
  pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:19 INFO     Processing PGS004254

   80%|███████▉  | 79/99 [02:00<00:14,  1.37it/s]
   82%|████████▏ | 81/99 [02:09<00:56,  3.12s/it]pgscatalog.core.cli.combine_cli: 2024-08-26 16:12:33 INFO     Processing PGS004256

   82%|████████▏ | 81/99 [02:09<00:28,  1.60s/it]
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-combine", line 8, in 
      sys.exit(run())
               ^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/cli/combine_cli.py", line 65, in run
      normalised_score = list(
                         ^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/scorefiles.py", line 485, in normalise
      yield from normalise(
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 71, in check_duplicates
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 302, in detect_complex
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 283, in check_effect_allele
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 159, in assign_other_allele
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 138, in check_effect_weight
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 191, in assign_effect_type
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 253, in check_bad_variant
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_normalise.py", line 221, in remap_harmonised
      for variant in variants:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/scorefiles.py", line 326, in _generate_variants
      yield from read_rows_lazy(
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/core/lib/_read.py", line 35, in read_rows_lazy
      yield ScoreVariant(**variant, **{"accession": name, "row_nr": row_nr})
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  TypeError: ScoreVariant.__init__() missing 1 required keyword-only argument: 'effect_weight'

Relevant files

PGS CATALOG SCORING FILE - see https://www.pgscatalog.org/downloads/#dl_ftp_scoring for additional information

format_version=2.0

POLYGENIC SCORE (PGS) INFORMATION

pgs_id=PGS004280

pgs_name=GenoBoost_all-cause_dementia_0

trait_reported=All-cause dementia

trait_mapped=dementia

trait_efo=MONDO_0001627

genome_build=hg19

variants_number=30

weight_type=beta

SOURCE INFORMATION

pgp_id=PGP000546

citation=Ohta R et al. Nat Commun (2024). doi:10.1038/s41467-024-48654-x

chr_name chr_position effect_allele other_allele dosage_0_weight dosage_1_weight dosage_2_weight 19 45395619 G A -0.1239091 0.2275461 0.68995 19 45396219 T C -0.1833033 0.1472852 0.4591863 19 45403412 T C 0.0663185 -0.0230981 -0.1023525 19 45389596 A G 0.0206892 -0.3438102 -1.0314368 19 45414451 T C 0.0626888 -0.0365833 -0.1220287 7 1569418 C T 0.0527261 0.0033867999999999997 -0.07103519999999999 7 100013457 T C 0.0262445 0.0068097999999999995 -0.0515715 9 6144065 A G -0.0226533 0.033497 0.0791566 9 10430602 C T -0.0162635 0.027885399999999998 0.1389225 16 12666279 G A -0.0205917 0.0189861 0.1378777 8 140267889 G A 0.0391212 -0.022668800000000003 -0.026091 5 156686040 C T -0.0164153 0.0582531 0.017068200000000002

System information

nextflow version 23.10.0

smlmbrt commented 1 week ago

Hi @Sabramow,

We are aware of the problems in point 1 (redundant with #314 and https://github.com/PGScatalog/pygscatalog/issues/44) and we may close the issue because of this.

Could you elaborate on part 2? What exact command did you use?

S

Sabramow commented 5 days ago

Thanks for addressing point 1!

For point 2: I was able to replicate this issue, which the attached screenshot of the score report demonstrates; although I specified PGS002759 in the command, it seems the scoring file pulled was PGS000767. Screenshot 2024-09-10 at 11 29 23 AM

smlmbrt commented 5 days ago

Hi @Sabramow, I can also replicate the problem, it's not a problem with the calculator though (something must have went off when we uploaded those specific harmonised files). Will leave this open while we sort it out.

smlmbrt commented 5 days ago

@Sabramow, we've replaced that file on the FTP. Future runs of pgsc_calc for that score should use the correct score (provided you delete the work directory to make sure a cached copy isn't used). Thanks again for reporting the issue!