gks-anvil / vrs_anvil_toolkit

Extract clinical variant interpretations from VCF using GA4GH VRS IDs
MIT License
2 stars 1 forks source link

Validate the VRS IDs of VRS objects in tests #11

Closed quinnwai closed 8 months ago

quinnwai commented 8 months ago

We currently only validate the Allele object instead of the VRS ID as well, let's validate the object to ensure that we are alerted about any changes in ID generation that happens on the VRS side

Relevant file: tests/unit/test_my_annotator.py

bwalsh commented 8 months ago

See the 'id' field removal in the test above

quinnwai commented 8 months ago

will be addressed as part of #50 when reworking pytests

quinnwai commented 8 months ago

Will need to validate the IDs as safeguarding. Though we are currently tied to a given version of metakb/vrs-python and so the IDs will not update as results, some code changes might not reflect that. As a result, we want to be warned if IDs change, as ID changes invalidate our cache and hence some of our metakb models

quinnwai commented 8 months ago

this would be testing without validating IDs...

def test_results(my_translator):
    """Ensure we can get the same results from gnomad as vrs-python, ie
    that the id and digest are computed recursively for the Allele object"""
    tlr = my_translator
    assert tlr is not None
    tlr.normalize = False

    identifiers = ["id", "digest"]

    inputs_dict = {
        "snv": (snv_inputs, snv_output),
        "deletion": (deletion_inputs, gnomad_deletion_output),
        "insertion": (insertion_inputs, gnomad_insertion_output),
        "duplication": (duplication_inputs, duplication_output),
    }

    for variant_type, (input, output) in inputs_dict.items():
        gnomad_expr = input["gnomad"]
        # allele object validation
        allele_dict = tlr.translate_from(fmt="gnomad", var=gnomad_expr).model_dump(
            exclude_none=True
        )

        # TODO: make decision on this
        for key in identifiers:
            assert (
                key in allele_dict
            ), f"{key} not found for {variant_type} expression {gnomad_expr}"
            del allele_dict[key]

        # location object validation
        assert "location" in allele_dict, "nested location dict not found"
        for key in identifiers:
            assert (
                key in allele_dict["location"]
            ), f"{key} not found for {variant_type} expression {gnomad_expr}"
            del allele_dict["location"][key]

        assert output == allele_dict, (
            f"{variant_type} does not match for {gnomad_expr}: "
            + f"\nexpected: {output}\n!==\nactual: {allele_dict}"
        )
quinnwai commented 8 months ago

Updated to test with validation on #50