SACGF / variantgrid

VariantGrid public repo
Other
23 stars 2 forks source link

Variant Tag importer #316

Closed davmlaw closed 3 years ago

davmlaw commented 3 years ago

Do as part of http://github.com/SACGF/variantgrid_sapath/issues/81

davmlaw commented 3 years ago

In order to deal with tags that may not match to an analysis, I had to allow tags to not be linked to an analysis.

I added the ability to add/remove tags from the variant details page - you can only add 1 per tag/user (outside of an analysis)

davmlaw commented 3 years ago

Exported tags from VG2 May 27 at 17:55 - last tag was done at 2021-05-27 17:40

Used script to convert names and only take those after 2021-03-06 - which was 5,777 tags.

Imported into VG3Upgrade - http://frgeneseq02.imvs.sa.gov.au:92/upload/view_upload_pipeline/4934

Note: Keeping tags in sync will be an ongoing process - handled in SA Path #81

Please raise any feature requests in #374 - Variant Tag imports v2

Testing:

Something to think about: You can export tags from 1 system and import them into another - should we do this for our various servers? For old ones, need to run through a conversion program. If the user doesn't match then it assigns it to the person who uploaded

sksmi commented 3 years ago

ISSUE

Tested: PASS: 1 tag/user PASS: user can only delete own tags ISSUE1: VG3.0 import of 4 tags created in VG2.0 27/5-3/6. 1 tag wasn't imported, but it was created at 4:30pm 3/6 so I'm assuming this was after the cut-off? Tags for 2 other variants weren't imported. 1 variant isn't present in VG3.0 CA367310697.

Would vote against moving tags between systems unless explicitly requested. Two reasons:


Tag Import testing details:

PASS (TBC): CA148939 8:100454804 A>G (GRCh37) June 3, 2021, 4:30 p.m SingleVariantARGene - missing tag in VG3 (expected?) May 3, 2021, 2:01 p.m SingleVariantARGene - present April 12, 2021, 2:41 p.m SingleVariantARGene - present April 8, 2020, 9:50 p.m PopFreqTooCommon - present

FAIL: CA367310697 7:39991112 C>T (GRCh37) - variant didn't exist in VG3.0 June 2, 2021, 12:02 p.m Review - missing variant June 1, 2021, 4:59 p.m Inherited - missing variant

FAIL: CA224977 3:87302861 G>C (GRCh37) June 1, 2021, 11:54 a.m.Pathogenic - missing

FAIL: CA7379109 14:105411302 G>C (GRCh37) May 31, 2021, 12:09 p.m. LittleGeneInfo - missing

sksmi commented 3 years ago

@davmlaw any thoughts on a quick way to convert these guys (Clinvar download of Invitae LPP classifications) to something that would fit into the tag importer? Can spend a bunch more time working on the conversion, but am concerned about introducing bad coords into the db.. just want to add them with a tag "InvitaeLPP" to use as our gold classification set..

https://app.zenhub.com/files/299486514/1ebef114-4506-4556-92d6-3af109214caa/download

davmlaw commented 3 years ago

For the dates I was using ISO 8601 ie YYYY-MM-DD

so "2021-03-06 and 2021-05-27" is 6 Mar - 27 May you seem to have tested "27/5-3/6"

davmlaw commented 3 years ago

For the ClinVar stuff, could you use:

All variants -> Built in filter (ClinVar) -> FilterNode with clinvar_clinical_sources contains "Invitae" AND min_pathogenicity greater than 4?

Or, you could take the ClinVar VCF and grep for "(^#|Invitae)" (to keep headers and lines matching Invitae) then upload that as a VCF

What I should really do is maybe make classifications node also work on ClinVar and then allow you to specify labs

sksmi commented 3 years ago

Re. ClinVar, thx - I should've thought of that a while back!

Re. tag import - to be fair I'm in lockdown. Dates are completely irrelevant. And in that case testing passes.

sksmi commented 3 years ago

Just tried the all variants - clinvar filter option -- remembered that I did do this before & we don't have enough variants - only gives me 218 which is too low for testing.

sksmi commented 3 years ago

Closing. Import managed else: SAP#99