NCATS-Tangerine / ncats-ingest

Management of ingestion of sources for NCATS-translator
2 stars 2 forks source link

List of databases potentially relevant to the competency questions for fanconi #15

Open jmcmurry opened 7 years ago

jmcmurry commented 7 years ago

CQ matrix is here

List of databases is here; please prioritize and add.

dnahotline commented 7 years ago

@kshefchek I think you are assembling datasets (right?). Here is a link for some that might be relevant to cancers that develop after people are treated for an initial cancer. Some of the competency questions will need these types of datasets. https://dceg.cancer.gov/research/what-we-study/second-cancers#Treatment

dnahotline commented 7 years ago

@mellybelly check this out. I would like to contact her and see if we can explore collaboration. https://dceg.cancer.gov/about/staff-directory/biographies/K-N/morton-lindsay

dnahotline commented 7 years ago

@jmcmurry @mellybelly @pnrobinson @kshefchek The new gnome dataset was posted today. Peter and I will be working on letter to the PI for collaboration. https://macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/

dnahotline commented 7 years ago

data set useful for basic science CQs? [ @mbrush here are some things I'm working on FYI]

http://www.reactome.org Dbgap (germline cancer) https://www.ncbi.nlm.nih.gov/gap

FA

Aldehyde

BMF Registries to check

General normal, better than normal, or unknown set

Cancer

dnahotline commented 7 years ago

Here I will be listing interesting datasets, and ideas for them, from the Wellcome Genomics of Rare Disease Conference. Datasets from certain countries may prioritize greater good over privacy. @jmcmurry @mellybelly @mbrush @kshefchek @pnrobinson

Datasets to Consider

Danish Newborn screening biobank, blood spots. Every Danish baby since 1982 (Benjamin Neale Broad contact)https://www.ncbi.nlm.nih.gov/pubmed/17632694 MIGEN METSIM FINRISK T2D genes/goTD2/SIGMA IBD consortium (Sek K's dataset which I didn't catch) GTEX genotype tissue expression project (contact Beryl Cummings, Broad. Note that Cummings et al has an isoform-level correction for polyA tail bias). Transcriptome DDD dataset (developmental disorders) 8000 patients. (contact Mathew Hurles, Wellcome Trust, Sanger). ENIGMA consortium 13,171 people (population and case control neuroimaging genetics data). Also CHARGE consortium (12,000 people). some GWAS? Can we use or we need WGS or Exome only? Nijmegen (Contact Hans Brunner). These already on our list, right?: Reactome, ClinVar, ClinGen, 1000 Genomes, dbSnp, HGMD Public, LOVD, UniProt, Database of Genomic Variants (DGV), DECIPHER, OMIM, EVS and ExAC. PanelApp and the 100,000 genomes Project https://panelapp.extge.co.uk https://broadinstitute.org.cmap Paul's work on Grey Team. Cancer cells, but use them as little bags of cellular processes. (steps: create synthetic path for FA. 1. map FA gene network. 2. see which genes in network are uprgulated by drugs 3. see which are down regulated. 4. ensure this works in a normal cell. Could lead to repurposing drugs). HiC/5C COSMIC ESP DIDA http://dida.ibsquare.be (poster 44) Daniel Greene MRC Cambridge. dataset of 5815 pts WGS diverse rare diseases. Statistical tool not published yet. DDD8K, 7833 trios in Deciphering Developmental Disorders includes about 100 South Asians. Hilary Martin at Wellcome Trust Sanger is a contact. https://www.ddduk.org https://www.humancellatlas.org This might have info for gene expression in cells of interest, in particular hematopoticotic cells. Also collecting cells from preg terminations to have an atlas during development. http://www.hdbr.org New NIH Clinical Trial on effects of alcohol. Contact and ask re data they plan to collect. Could this be useful for us relevant to FA-Alcohol use and outcomes? https://www.nytimes.com/2017/07/03/well/eat/alcohol-national-institutes-of-health-clinical-trial.html?mcubz=0&_r=0 Shannon Mc said she had a HNSCC dataset. However she said it would only be useful for "if emphasis is LOH, I dont think our data would be helpful as that was not our emphasis"-- we need to discuss options, perhaps with @pnrobinson New release of UK Biobank http://www.ukbiobank.ac.uk/about-biobank-uk/

Tools

Variant Validator https://variantvalidator.org/ MutationTaster BeviMed -rare variant association inference bonding and non-coding loci improvement vs existing: (SKAT, ADA, CAST) Opentargets GSK collab w EMI ( Andrew Nightingale. Has ingested quite a bit then gave up bc too complex). NextProt -human expression (does this one show (RNA, protein) expression by tissue?).