PGScatalog / pgscatalog_utils

(superseded by pygscatalog) Utilities for working with PGS Catalog API and scoring files
Apache License 2.0
4 stars 3 forks source link

combine_matches fails if no matches `matched` are found #52

Closed openpaul closed 8 months ago

openpaul commented 1 year ago

While learning how to use https://github.com/PGScatalog/pgsc_calc/ I came across the error:

AssertionError: Duplicate IDs in final matches

Thrown by https://github.com/PGScatalog/pgscatalog_utils/blob/main/pgscatalog_utils/match/combine_matches.py#L52

In my case I debugged the input file and found that

max_occurrence == [None]

Clearly the issue is not duplicated IDs but rather no ids. I added a small debug statement and got:

set(matches.collect().get_column("match_status").to_list())

# {'excluded', 'not_best'}

I am not sure what check to add or how such a situation should be handled, but currently the pgsc_calc just crashed quite harshly.

This is related to https://github.com/PGScatalog/pgsc_calc/issues/72 and https://github.com/PGScatalog/pgscatalog_utils/issues/36.

I hope this helps making the pipeline more robust.

smlmbrt commented 1 year ago

Thanks @openpaul, making this check (and all the errors/assertions) more informative & robust is very much on our radar! Quick question: where you running the matching software with a single PGS consisting of a single variant?

openpaul commented 1 year ago

cheers, its already and amazing resource you are building here. I am just learning how to work with snp data, so might have been my input data.

I was not using it though with a single PGS with a single variant. Cant find logs for it anymore, but will update you with the command I ran if I see the bug again.

I think it was triggered as my vcf file was too short, to few variants. But not sure

nebfield commented 8 months ago

https://github.com/PGScatalog/pgscatalog_utils/releases/tag/v0.4.3