broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

Variant curation using ClinGen VCI #3182

Closed lynnpais closed 1 year ago

lynnpais commented 1 year ago

Is your feature request related to a problem? Please describe. As the analysts are ramping up curation of variants to submit to ClinVar, we would like to simplify the process of pulling up the variant in the ClinGen VCI - https://curation.clinicalgenome.org/select-variant. The VCI does require a user to sign in.

Describe the solution you'd like To begin variant curation, the VCI needs an ID to pull up relevant variant and metadata. The VCI recognizes several IDs but the safest route would be to search the ClinGen Allele Registry with a valid HGVS term for that variant here - https://reg.genome.network/redmine/projects/registry/genboree_registry/landing

Open to ideas for how to do this. One option is to display a link to the ClinGen Allele Registry with the relevant HGVS details prefilled for a variant. The analyst can copy that ID and paste it into the VCI. Example: GRCh38 chr2:50718639G>T

Input page:

Screen Shot 2023-01-24 at 12 48 27 PM

Output page:

Screen Shot 2023-01-24 at 12 48 17 PM

I don’t know if there’s a way for you to reduce it from a two step (pull ID and paste in VCI) to a one step process. The ClinGen Allele Registry does not require sign in, but the VCI does.

Here’s an example of what it could look like using the existing Classify feature which could be a hover over to show options.

Screen Shot 2023-01-24 at 12 38 06 PM

Describe alternatives you've considered Open to other ideas.

Additional context Also, it would be nice, but not required if after the variant was curated in the VCI, the status (path/ benign) would show for the variant in seqr. However, due to ClinGen user authentication requirements, I don’t think this can be done.

View of ClinGen VCI [https://curation.clinicalgenome.org/select-variant] where ID is entered:

Screen Shot 2023-01-24 at 12 44 17 PM
hanars commented 1 year ago

One option is to display a link to the ClinGen Allele Registry with the relevant HGVS details prefilled for a variant. The analyst can copy that ID and paste it into the VCI ... I don’t know if there’s a way for you to reduce it from a two step (pull ID and paste in VCI) to a one step process. The ClinGen Allele Registry does not require sign in, but the VCI does.

I think this should be doable. Let's say the plan is that when the user selects the "classify in VCI" link, behind the scenes seqr automatically does the allele registry step and then, ideally takes the user directly to the curation page for that exact variant. We can spend some time investigating if thats possible or not and update here as we figure out what our options are

Also, it would be nice, but not required if after the variant was curated in the VCI, the status (path/ benign) would show for the variant in seqr

Agree that this would be nice, but probably this would need its own ticket to better investigate what our options are, and I don't want to block this ticket on figuring that out

lynnpais commented 1 year ago

The plan sounds good!

I'll create a separate ticket for the other request.

ShifaSZ commented 1 year ago

I don’t know if there’s a way for you to reduce it from a two step (pull ID and paste in VCI) to a one step process.

We can go to the result page without opening the search page in the ClinGen Allele Registry by encoding the ID into the URL. Can it meet the requirement to go to the result page directly?

lynnpais commented 1 year ago

Yes, that should be okay if it works

On Fri, Jan 27, 2023 at 12:02 PM ShifaSZ @.***> wrote:

I don’t know if there’s a way for you to reduce it from a two step (pull ID and paste in VCI) to a one step process.

We can go to the result page without opening the search page in the ClinGen Allele Registry by encoding the ID into the URL. Does it meet the requirement to go to the result page directly?

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/seqr/issues/3182#issuecomment-1406790231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJA6EOX3P7BI3PZWSRGLDBDWUP5Q5ANCNFSM6AAAAAAUFMLNWM . You are receiving this because you authored the thread.Message ID: @.***>

larrybabb commented 1 year ago

I have sent a request to the VCI team to see if they can provide any info on how we can do this the most efficiently and with the least amount of technical debt or design fragility.

ShifaSZ commented 1 year ago

Let's say the plan is that when the user selects the "classify in VCI" link, behind the scenes seqr automatically does the allele registry step and then, ideally takes the user directly to the curation page for that exact variant.

We could do like this: when seqr opens the Classify popup, it calls the Allele Registry API at "http://reg.genome.network/allele?hgvs={hgvs_expression}" (the format of the hgvs_expression can be found in this link) and gets a JSON object. We can retrieve the Canonical Allele Identifier from the @id field in the object and display the identifier in the popup to allow the user to copy it.

When the user clicks on the "classify in VCI" link after copying the identifier, seqr will open the variant selection page of the VCI at https://curation.clinicalgenome.org/select-variant/. It seems not possible to add the allele identifier to the URL path or parameters. After the user pastes the identifier, clicks the Retrieve button, and then clicks the View Evidence to start curation (see the variant selection figure below), the VCI opens a variant central page at a URL likehttps://curation.clinicalgenome.org/variant-central/72db4e89-fc79-4d37-b791-308a0c8f8560as the figure below. The72db4e89-fc79-4d37-b791-308a0c8f8560` in the URL looks like an internal ID for the allele. If we can have the API to get this internal ID, we can bring the user to the curation page directly without opening the variant selection page.

The variant selection page: image

The variant central page: image

hanars commented 1 year ago

We can retrieve the Canonical Allele Identifier from the @id field in the object and display the identifier in the popup to allow the user to copy it.

If we have to do this approach, I would say we should do this in the background before we we even show the link to the VCI. So we should briefly show a spinner, and then we should show a link to the VCI and underneath it show the Clingen Allele ID. So in Lynn's mock up, instead of showing the "Get Allele Identifier" we would just show the identifier itself

The 72db4e89-fc79-4d37-b791-308a0c8f8560 in the URL looks like an internal ID for the allele. If we can have the API to get this internal ID, we can bring the user to the curation page directly without opening the variant selection page.

This would obviously be the best approach. Can you look into the API and see if there is any chance that this is possible? If not, we can have @larrybabb bring it up in his meeting with the VCI

ShifaSZ commented 1 year ago

I have a question about the chromosome name or id for the ClinGen Allele Registry API. The API requires a RefSeq chromosome accession, for example, the RefSeq ID NC_000002.12 for GRCh38 chr2. I can find the mapping from the chromosomes to the RefSeq accessions on this page. Is there an existing mapping tool I should use, or should I use the information from the web page to create a mapping function?

hanars commented 1 year ago

The API requires a RefSeq chromosome accession

From poking around it seems the API supports lots of different ways of querying variants, including hgvsc which is already available for the variant

ShifaSZ commented 1 year ago

I tried using the hgvsc and found the results are the same as using the RefSeq ID. Are the hgvsc fields only available for the coding variants? I have a little concern because of its name.

hanars commented 1 year ago

hgvsc will be available on every SNP that has at at least one gene associated with it. I don't think we need to worry about other cases. We can make the link conditional on the presence of hgvsc if we want

hanars commented 1 year ago

The link is now live but the Clingen Allele ID fetching is broken, the request is being blocked. @ShifaSZ can you please look into this

ShifaSZ commented 1 year ago

It is because the ClinGen Allele Registry API uses an http:// rather than an https:// URL. The production environment prohibits http:// requests. I'm checking if we can change it to https://.

ShifaSZ commented 1 year ago

The API also works with the https:// URL. So I will make a PR for the change.

hanars commented 1 year ago

this is now fully live