gks-anvil / vrs_anvil_toolkit

Extract clinical variant interpretations from VCF using GA4GH VRS IDs
MIT License
2 stars 1 forks source link

bug: tests/data/test_vcf_input.vcf expected errors? #16

Closed bwalsh closed 7 months ago

bwalsh commented 8 months ago

As a vrs-anvil user, we realize no VCF is perfect, however, as testers we should absolutely know if these are correct expected errors?

TODO confirm these are expected errors ?

[
('chr19-54220999-A-A', ValidationError('Expected reference sequence A on GRCh38:chr19 at positions (54220998, 54220999) but found T')), 
('chr19-54220999-A-T', ValidationError('Expected reference sequence A on GRCh38:chr19 at positions (54220998, 54220999) but found T')), 
('chr19-54221654-A-A', ValidationError('Expected reference sequence A on GRCh38:chr19 at positions (54221653, 54221654) but found T')), 
('chr19-54221654-A-T', ValidationError('Expected reference sequence A on GRCh38:chr19 at positions (54221653, 54221654) but found T')), 
('chr19-54221654-A-P', ValueError('Unable to parse data as gnomad variation'))
]

source tests/data/test_vcf_input.vcf tests/unit/test_vcf_annotation.py::test_small_vcf_annotation

bwalsh commented 8 months ago

Add the large file ~330K lines we had about 3000 INS, DEL alt "cannot parse" errors

bwalsh commented 8 months ago

Are these valid, to be expected ie "normal" errors.

bwalsh commented 8 months ago

Ask our group K, BK, J, N see what insights we can gather