clingen-data-model / clingen-hail-reports

Performs filtering against gnomAD and ClinVar datasets. Uses Hail to report records with a population FAF above certain thresholds by gene.
1 stars 0 forks source link

Add gnomad frequency & clinvar status to Invitae URM list #9

Closed sharriso closed 2 years ago

sharriso commented 2 years ago

Invitae provided a list of variants annotation with their internal population data. We would like this list to also be annotated the gnomAD allele frequency data and ClinVar status

dazzariti commented 2 years ago

@larrybabb Just want to check on the expected timeline for this - I think it's a short enough list that we could do it manually if needed

dazzariti commented 2 years ago

Here is the initial request and spreadsheet:

Here is the variant list in both NC and NM expressions: https://docs.google.com/spreadsheets/d/1IigM8yza39AWlcPmBI-visLBqNkVLnVXMVwUwks1hDI/edit?usp=sharing Does that work for being able to add in gnomAD and ClinVar annotations?

I'll forward the initial email chain request which has additional information, thanks!

larrybabb commented 2 years ago

@dazzariti @sharriso when you say clinvar status are you referring to the VCV top-level interpretation (aka aggregate clinical significance)? if not, please clarify what value from clinvar you'd like us to annotate the records with.

larrybabb commented 2 years ago

@dazzariti @theferrit32 and I are working on this and should be able to fulfill this by end of day.

larrybabb commented 2 years ago

From @theferrit32

Hi all, I have added a sheet to the input spreadsheet here: https://docs.google.com/spreadsheets/d/1IigM8yza39AWlcPmBI-visLBqNkVLnVXMVwUwks1hDI/edit#gid=350521121

Note that there is a small thing where if the allele count for the filtered popmax in gnomad is 0 (meaning it was 0 for all subpopulations), the popmax label shows up here as 'afr', but really it should be null like is shown in the gnomad UI. There is a github issue to track that for this report: https://github.com/clingen-data-model/gnomad-frequency-report/issues/7

I've spot checked a few different kinds of output records in this sheet, but let me know if you find any issues.

Data was annotated by mapping to ClinVar with the NC coordinates, and then mapping to GnomAD with the ClinVar locus + ref/alt alleles.

  • Kyle