SACGF / variantgrid

VariantGrid public repo
Other
23 stars 2 forks source link

Optimise HGVS in Annotation VCF upload processing #1128

Open davmlaw opened 3 months ago

davmlaw commented 3 months ago

Found while investigating SACGF/variantgrid_com#82

The SV pipeline does its own HGVS as VEP doesn't support symbolic variants

However it seems to run really slow. It errors with:

HGVSConverterType.CLINGEN_ALLELE_REGISTRY: ClinGeneAllele API Error: VariationTooLong (Variation given on the input is too long. There is 10000 bp limit for variation's length.) for input 'NC_000017.11:g.43078305_43084385dup'

I thought the limit was 10k as 43084385 - 43078305 = 6080 so that should be ok

But even if you just enter it in the search you get VariationTooLong

davmlaw commented 3 months ago

Paper says

The maximal nucleotide (transcript or genomic) allele size is 10,000 bp

43084385 - 43078305 = 6080 43083385 - 43078305 = 5080 - fails 43083300 - 43078305 = 4995 - this works

davmlaw commented 3 months ago

I think this is a bug with ClinGen Allele registry, have raised an issue with them and will use a work around until they fix it

davmlaw commented 3 months ago

The end result is that we fail early for ClinGen requests for dups over 5k, saving an API call - This is not really user testable

davmlaw commented 2 months ago

Not really user testable. SV pipelines have continued to run, make less API calls

EmmaTudini commented 2 months ago

Re-opening for Shariant testing Increases stability of liftover by not trying to contact ClinGen if there’s 5000+ base pair dups. Will fail ClinGen liftover, but will no longer block other variants in the queue.