Open tskir opened 3 years ago
Notes from today's meeting:
First step is to investigate and list data sources, and discuss with OT before we start any implementation.
From reading Clinvar publication, here are the resources that are already feeding into Clinvar:
Other source of variants from the grant application:
HGMD is also a repository of variant to disease associtions but all of them are curated from papers. HGMD public has a license that does not permit for use for profit (is it compatible with the Open-targets collaboration with pharma) and does not allow the data to be reproduces and share (might not work with the way open targets currently work). HGMD public has 210,341 mutations and the professional license (HGMD Professional 2021.2) has 323,661 mutations. It is not cledar to me what the license of HGMD Professional 2021.2 permits.
PanelApp: PanelApp contains list of disease and associated Gene that should be sequenced. In some case the disease is directly associated with a variant (STR and CNV). The genes and variants that are present in PanelApp are unlikely to be novel (compare to Clinvar) since they are by nature well characterised. But their presence in the app could be used to annotated the confidence open-targets can have with the Gene/variant to disease association. PanelApp will provide Expert also provide a traffic light base confidence of the association.
@tskir
Would you suggest that we look into HGMD or the license issue will make it unusable?
For PanelApp, I think it is useable but will provide little new evidences.
The second part of this task was about looking into other sources that would provide informative annotation of the variants There are two type of annotations:
Right now I'm working under the assumption that these would not provide new variant-disease association but might provide additional context for existing one.
Before we look into how this could be implemented it would be useful to know if these are valuable pieces of information to Open Targets.
It looks like PanelApp is already listed as a data source for the OT Platform. OT Genetics also might have some overlap regarding additional context sources, though I don't see anything about allele frequencies.
In the OT Platform we use PanelApp gene-disease associations but not the STRs (curated in house by Genomis England) or CNVs (sourced from ClinGen haplosufficiency assessments) which could provide additional complex variation to EVA but it is probably worth comparing whether these are already covered by ClinVar.
Other additional sources that may be worth investigating:
Proposal, Task 3