Closed jstone-uw closed 1 month ago
I've confirmed that the current validation code rejects HGVS strings containing *
.
In the current database, there are 48,722 variants in 6 published score sets, plus 731 variants in 4 unpublished score sets, whose hgvs_pro string contains an asterisk.
select * from variants v, scoresets ss
where
(v.hgvs_pro like '%*]%' or v.hgvs_pro like '%*;%' or v.hgvs_pro like '%*' or v.hgvs_pro like '%.*%')
and v.scoreset_id=ss.id
and ss.published_date is not null;
There is also one variant (urn:mavedb:00000062-a-1#107) that uses the asterisk in a different way: p.Asn234Thrfs*5
. This looks invalid to me, and maybe it's a typo.
select * from variants v, scoresets ss
where
v.hgvs_pro like '%*%'
and not (v.hgvs_pro like '%*]%' or v.hgvs_pro like '%*;%' or v.hgvs_pro like '%*' or v.hgvs_pro like '%.*%')
and v.scoreset_id=ss.id
and ss.published_date is not null;
I haven't spotted variants using single-character amino acid codes, but a full re-validation of existing variant strings might be worthwhile.
The rest of the score sets correctly use Ter
in hgvs_pro strings. Valid asterisk are present in hgvs_nt and hgvs_splice strings.
I propose we correct this manually by running
update variants v
set hgvs_pro=replace(hgvs_pro, '*', 'Ter')
where
v.hgvs_pro like '%*]%' or v.hgvs_pro like '%*;%' or v.hgvs_pro like '%*' or v.hgvs_pro like '%.*%';
This has been run on the staging server and affected 49,453 rows as expected.
We can then edit the odd variant (typo?) urn:mavedb:00000062-a-1#107 manually after determining what it should be.
Ter
as a representation premature stop codons but not*
. The MaveHGVS standard does not support the short notation.*
in substitutions withTer
.Ter
. It's fine to continue supporting*
in the visualization as well.