PGScatalog / pygscatalog

Python applications and libraries for working with PGS data and the PGS Catalog
https://pygscatalog.readthedocs.io/en/latest/
Apache License 2.0
5 stars 1 forks source link

Exception: Bad effect weights #44

Open TravisMizeIGH opened 1 month ago

TravisMizeIGH commented 1 month ago

Description of the bug

Hello,

It appears some of the newer PGS have incorrect headers, causing a traceback (see below). I believe this is due to some scores having "dosage_0_weight" "dosage_1_weight" "dosage_2_weight" instead of "effect_weight" as the column header. This occurs in the following PGS:

PGS004255_hmPOS_GRCh38.txt PGS004256_hmPOS_GRCh38.txt PGS004258_hmPOS_GRCh38.txt PGS004259_hmPOS_GRCh38.txt PGS004260_hmPOS_GRCh38.txt PGS004261_hmPOS_GRCh38.txt PGS004262_hmPOS_GRCh38.txt PGS004263_hmPOS_GRCh38.txt PGS004264_hmPOS_GRCh38.txt PGS004272_hmPOS_GRCh38.txt PGS004273_hmPOS_GRCh38.txt PGS004280_hmPOS_GRCh38.txt PGS004299_hmPOS_GRCh38.txt PGS004301_hmPOS_GRCh38.txt PGS004304_hmPOS_GRCh38.txt

Command error: pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:15 DEBUG Other allele column detected, including other_allele in variant identifier pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:15 DEBUG Only single other alleles detected. pgscatalog_utils.scorefile.effect_weight: 2024-08-07 16:06:15 DEBUG Single effect weight column detected pgscatalog_utils.scorefile.effect_weight: 2024-08-07 16:06:15 DEBUG Skipping melt pgscatalog_utils.scorefile.effect_type: 2024-08-07 16:06:15 DEBUG No effect types set, using default (additive) pgscatalog_utils.scorefile.write: 2024-08-07 16:06:16 DEBUG Output file exists: setting write mode to append pgscatalog_utils.scorefile.write: 2024-08-07 16:06:16 DEBUG Writing out gzip-compressed combined scorefile pgscatalog_utils.scorefile.read: 2024-08-07 16:06:23 DEBUG Reading scorefile PGS004227_hmPOS_GRCh38.txt pgscatalog_utils.scorefile.harmonised: 2024-08-07 16:06:23 DEBUG Harmonised columns detected and used pgscatalog_utils.scorefile.harmonised: 2024-08-07 16:06:23 DEBUG other_allele column contains information, dropping hm_inferOtherAllele pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:23 DEBUG Quality control: checking for bad variants pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:23 DEBUG Other allele column detected, including other_allele in variant identifier pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:23 DEBUG Only single other alleles detected. pgscatalog_utils.scorefile.effect_weight: 2024-08-07 16:06:23 DEBUG Single effect weight column detected pgscatalog_utils.scorefile.effect_weight: 2024-08-07 16:06:23 DEBUG Skipping melt pgscatalog_utils.scorefile.effect_type: 2024-08-07 16:06:23 DEBUG No effect types set, using default (additive) pgscatalog_utils.scorefile.write: 2024-08-07 16:06:23 DEBUG Output file exists: setting write mode to append pgscatalog_utils.scorefile.write: 2024-08-07 16:06:23 DEBUG Writing out gzip-compressed combined scorefile pgscatalog_utils.scorefile.read: 2024-08-07 16:06:23 DEBUG Reading scorefile PGS004202_hmPOS_GRCh38.txt pgscatalog_utils.scorefile.harmonised: 2024-08-07 16:06:23 DEBUG Harmonised columns detected and used pgscatalog_utils.scorefile.harmonised: 2024-08-07 16:06:23 DEBUG other_allele column contains information, dropping hm_inferOtherAllele pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:23 DEBUG Quality control: checking for bad variants pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:23 DEBUG Other allele column detected, including other_allele in variant identifier pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:23 DEBUG Only single other alleles detected. pgscatalog_utils.scorefile.effect_weight: 2024-08-07 16:06:23 DEBUG Single effect weight column detected pgscatalog_utils.scorefile.effect_weight: 2024-08-07 16:06:23 DEBUG Skipping melt pgscatalog_utils.scorefile.effect_type: 2024-08-07 16:06:23 DEBUG No effect types set, using default (additive) pgscatalog_utils.scorefile.write: 2024-08-07 16:06:23 DEBUG Output file exists: setting write mode to append pgscatalog_utils.scorefile.write: 2024-08-07 16:06:23 DEBUG Writing out gzip-compressed combined scorefile pgscatalog_utils.scorefile.read: 2024-08-07 16:06:23 DEBUG Reading scorefile PGS004256_hmPOS_GRCh38.txt pgscatalog_utils.scorefile.harmonised: 2024-08-07 16:06:24 DEBUG Harmonised columns detected and used pgscatalog_utils.scorefile.harmonised: 2024-08-07 16:06:24 DEBUG other_allele column contains information, dropping hm_inferOtherAllele pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:24 DEBUG Quality control: checking for bad variants pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:24 DEBUG Other allele column detected, including other_allele in variant identifier pgscatalog_utils.scorefile.qc: 2024-08-07 16:06:24 DEBUG Only single other alleles detected. pgscatalog_utils.scorefile.effect_weight: 2024-08-07 16:06:24 ERROR ERROR: Missing valid effect weight columns

Traceback (most recent call last): File "/venv/bin/combine_scorefiles", line 8, in sys.exit(combine_scorefiles()) File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/combine_scorefiles.py", line 84, in combine_scorefiles .pipe(melt_effect_weights) File "/venv/lib/python3.10/site-packages/pandas/core/generic.py", line 5839, in pipe return com.pipe(self, func, *args, *kwargs) File "/venv/lib/python3.10/site-packages/pandas/core/common.py", line 513, in pipe return func(obj, args, **kwargs) File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/effect_weight.py", line 11, in melt_effect_weights elongate = _detect_multiple_weight_columns(df) File "/venv/lib/python3.10/site-packages/pgscatalog_utils/scorefile/effect_weight.py", line 43, in _detect_multiple_weight_columns raise Exception("Bad effect weights") Exception: Bad effect weights

Command used and terminal output

No response

Relevant files

No response

System information

No response

smlmbrt commented 1 month ago

Thanks for the bug report, this shouldn't cause the pipeline to break but it should warn users. Going to transfer this issue to pygscatalog (utils that are breaking) as it's also somewhat redundant with PGScatalog/pgsc_calc#314

TravisMizeIGH commented 3 weeks ago

PGS004239_hmPOS_GRCh38.txt is also in an incorrect format/missing information which causes pgscatalog to error out