ga4gh / vrs

Extensible specification for representing and uniquely identifying biological sequence variation
https://vrs.ga4gh.org
Apache License 2.0
80 stars 32 forks source link

Validation errors with ga4gh.vrs==2.0.0a2 #458

Closed bwalsh closed 8 months ago

bwalsh commented 8 months ago
python3 -m ga4gh.vrs.extras.vcf_annotation --vcf_in REDACTED.vcf --vcf_out output.vcf.gz --vrs_pickle_out vrs_objects.pkl  --seqrepo_root_dir ~/seqrepo/2021-01-29

ValidationError when translating 1-565508-G-G from gnomad: Expected reference sequence G on GRCh38:1 at positions (565507, 565508) but found N
VRS error on 1-565508
Traceback (most recent call last):
  File "/home/jupyter/.local/lib/python3.10/site-packages/ga4gh/vrs/extras/vcf_annotation.py", line 230, in annotate
    vrs_field_data = self._get_vrs_data(
  File "/home/jupyter/.local/lib/python3.10/site-packages/ga4gh/vrs/extras/vcf_annotation.py", line 361, in _get_vrs_data
    self._get_vrs_object(
  File "/home/jupyter/.local/lib/python3.10/site-packages/ga4gh/vrs/extras/vcf_annotation.py", line 283, in _get_vrs_object
    vrs_obj = self.tlr._from_gnomad(vcf_coords, assembly_name=assembly)
  File "/home/jupyter/.local/lib/python3.10/site-packages/ga4gh/vrs/extras/translator.py", line 344, in _from_gnomad
    raise ValidationError(err_msg)
ga4gh.vrs.extras.translator.ValidationError: Expected reference sequence G on GRCh38:1 at positions (565507, 565508) but found N
Expected reference sequence T on GRCh38:1 at positions (567091, 567092) but found N

........  a huge log of these errors ....
bwalsh commented 8 months ago

Solved using --assembly GRCh37