Error - PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_COMBINE

gmmhe commented 1 year ago

Hi,

I'm trying to run my first polygenic risk scores using PGS catalog. But I found an issue. I copy the error code below. I think the problem that I have, involves the preparation of my input genomes. I used plink2 v2.00a3.7 64-bit and I set up the chromosomes using your example code following this documentation https://pgsc-calc.readthedocs.io/en/dev/how-to/prepare.html :

./plink2 --vcf chr21.merged.clean.noMono.vcf.gz \ --allow-extra-chr \ --chr 1-22, X, Y, XY \ --make-pgen --out chr21_axy

When running my command in pgsc_calc, I run this:

./nextflow run pgscatalog/pgsc_calc \ -profile docker \ --input samplesheet3.csv --target_build GRCh38 \ --pgs_id PGS 000027 --target_build GRCh38

It seems that the problem is with -chrom parameter, but I was following the steps (I did not use all the chromosomes yet, I tried first with 1 chromosome and later with 3, but I don't think this is the problem). So I cannot see where is the issue. Copy here the error:

Error executing process > 'PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_COMBINE (cineca)'

Caused by: Process PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_COMBINE (cineca) terminated with an error exit status (1) executor > local (7) [70/1bcb7b] process > PGSCATALOG_PGSCALC:PGSCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS000027, pgp_id:, trait_efo:]) [100%] 1 of 1 ✔ [8a/80ab3a] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:SAMPLESHEET_JSON (samplesheet3.csv) [100%] 1 of 1 ✔ [7f/1649ca] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:COMBINE_SCOREFILES (1) [100%] 1 of 1 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM - [skipped ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cineca chromosome 21) [100%] 3 of 3, stored: 3 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_VCF - [84/d0940d] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_VARIANTS (cineca chromosome 1) [100%] 3 of 3 ✔ [7d/c7d7fe] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_COMBINE (cineca) [100%] 1 of 1, failed: 1 ✘ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:SCORE_AGGREGATE - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:SCORE_REPORT - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:DUMPSOFTWAREVERSIONS - Execution cancelled -- Finishing pending tasks before exit Error executing process > 'PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_COMBINE (cineca)'

Caused by: Process PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_COMBINE (cineca) terminated with an error exit status (1)

Command executed:

export POLARS_MAX_THREADS=2

combine_matches --dataset cineca --scorefile scorefiles.txt.gz --matches *.ipc.zst -n 2 --min_overlap 0.75 --outdir $PWD --split -v

cat <<-END_VERSIONS > versions.yml MATCH_COMBINE: pgscatalog_utils: $(echo $(python -c 'import pgscatalog_utils; print(pgscatalog_utils.version)')) END_VERSIONS

Command exit status: 1

Command output: (empty)

Command error: root: 2023-02-16 15:00:22 DEBUG Verbose logging pgscatalog_utils.config: 2023-02-16 15:00:22 DEBUG pgscatalog_utils.config: 2023-02-16 15:00:22 DEBUG pgscatalog_utils.match.read: 2023-02-16 15:00:22 DEBUG pgscatalog_utils.match.read: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.preprocess: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.preprocess: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.combine_matches: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.combine_matches: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.preprocess: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.label: 2023-02-16 15:00:24 DEBUG pgscatalog_utils.match.filter: 2023-02-16 15:00:25 DEBUG pgscatalog_utils.match.filter: 2023-02-16 15:00:25 DEBUG pgscatalog_utils.match.filter: 2023-02-16 15:00:28 ERROR pgscatalog_utils.match.match_variants: 2023-02-16 Traceback (most recent call last): File "/venv/bin/combine_matches", line sys.exit(combine_matches()) File "/venv/lib/python3.10/site-packages/pgscatalo log_and_write(matches=matches, scorefile=scorefile, File "/venv/lib/python3.10/site-packages/pgscatalo raise Exception("No valid matches found") Exception: No valid matches found enabled Using 2 threads to read CSVs polars threadpool size: 2 Reading scorefile --chrom parameter not set, using all variants in scoring file Complementing column effect_allele Complementing column other_allele Reading matches Labelling match candidates Labelling best match type (refalt > altref > ...) Labelling duplicated best match: keeping first instance as best_match = True Labelling multiple scoring file lines (accession/row_nr) that best_match to the same variant Labelling all duplicates with exclude flag Labelling ambiguous variants Complementing column REF Labelling ambiguous variants with exclude flag Labelling multiallelic matches with exclude flag Not excluding flipped matches Filtering to best_match variants (with exclude flag = False) Calculating overlap between target genome and scoring file Score PGS000027_hmPOS_GRCh38 fails minimum matching threshold (10.24% variants match) 15:00:28 CRITICAL Error: no target variants match any variants in scoring files 8, in g_utils/match/combine_matches.py", line 36, in combine_matches dataset=dataset, args=args) g_utils/match/match_variants.py", line 90, in log_and_write

Work dir: /Users/**/Documents/*****/work/7d/c7d7fe7745242f17be911429d993ab

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

ERROR: No scores calculated!

What Can I do?

Thanks in advance!

nebfield commented 1 year ago

Hello,

The PGS scoring file for PGS000027 contains about 2.1 million variants across the entire genome. To calculate polygenic scores accurately, as described by the polygenic score authors, it's important that we only calculate scores using a similar number of variants.

By default we prevent scores being calculated if at least 75% of variants in the scoring file aren't present in the input target genomes (this parameter can be adjusted with --min_overlap, but it's a bad idea to adjust normally).

There are a few technical reasons why a scoring file might match badly, like:

you set the wrong genome build
you forgot to impute your genomes

But a 10% match rate on 1 chromosome is quite good! I think if you try rerunning the workflow using all of your chromosomes the error should hopefully fix itself 😁 It's important to set up the split chromosomes in a single samplesheet (one row per chromosome).

Cheers, Ben

gmmhe commented 1 year ago

Thank you so much Ben, when I used all the chromosomes worked!

PGScatalog / pgsc_calc

Error - PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:MATCH_COMBINE #86