PGScatalog / pgscatalog_utils

(superseded by pygscatalog) Utilities for working with PGS Catalog API and scoring files
Apache License 2.0
4 stars 3 forks source link

Check valid matches in a consistent way #71

Closed nebfield closed 2 months ago

nebfield commented 8 months ago

Sometimes no variants in any scoring files match target variants

In match_variants:

if filter_summary.filter(pl.col("score_pass") == True).collect().is_empty():
    # this can happen when args.min_overlap = 0
    logger.critical("Error: no target variants match any variants in scoring files")
    raise Exception("No valid matches found")

In match_combine:

match n := max_occurrence[0]:
    case None:
        logger.critical("No variant matches found")
        logger.critical(
            "Did you set the correct genome build? Did you impute your genomes?")
        raise ValueError
    case _ if n > 1:
        logger.critical("Duplicate IDs in final matches")
        logger.critical(
            "Please double check your genomes for duplicates and try again")
        raise ValueError
    case _:
        logger.info("Scoring files are valid (no duplicate variants found)")

We should consistently use the structural pattern matching approach because it's more robust, but we're going to refactor matching into a library

nebfield commented 2 months ago

Deprecated by https://github.com/PGScatalog/pygscatalog, which is more consistent about most things 😄