1a4: Develop competency questions

nicolevasilevsky commented 4 years ago

@mellybelly stated these competency questions are adequate for our initial competency question development: Example queries here: https://docs.google.com/spreadsheets/d/1jVai85S3CYsQXcOpxk1KlxcUNLJS0y1ag8Q5x_hK93Y/edit#gid=859956776

RQ 1: A translational researcher wants to investigate the correlation between STAT5 protein level and JAK2 mutation status in AML patients using data obtained at the time of diagnosis and at remission by identifying cases with the following characteristics:

Disease type is acute myeloid leukemia
Somatic JAK2 mutation (mutation type, frequency, FATHMM scores are returned from GDC)
Quantitative mass spec data on STAT5 protein (STAT5 peptide measurement data are returned from PDC)
Pathology annotation of bone marrow biopsies (% blast and % cellularity from IDC)
Data available from two timepoints (diagnosis and remission)

RQ2: A translational researcher wants to identify key pathways that may contribute to predisposition to lung cancer which are not associated with smoking by querying for the following:

Disease type is lung cancer
Return cases who are males under the age of 45 at diagnosis and never smokers
List of somatic gene mutations from whole exome or genome sequencing (GDC)
Global quantitative mass spec data from tumor tissues (PDC)
Radiomics (CT, PET/CT images) data from tumor tissues (IDC)

RQ3: An investigator from CPTAC program uses one of the Cloud Resources platforms to query across GDC and PDC to aggregate RNA sequencing and proteomics data on 200 pediatric brain tumor samples generated from CPTAC program. He/she wants to perform transcriptome-proteome correlation analysis of samples by sending query result to a workspace on CR and executing proteogenomic analysis pipeline on a CR’s compute environment.

RQ4: A computational biologist would like to develop a model to predict drivers of metastasis of breast cancer by aggregating data from a case within HTAN DCC repository with RNA and imaging datasets from multiple time points (diagnosis, remission, and recurrence).

RQ5: An investigator wants to compare genomic variants from colon cancer cases in the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) stored in the CDS with cases from the TCGA colon cancer project stored in the GDC.

For the 4 month model, we will have an idea of a high level model in mind.

nicolevasilevsky commented 4 years ago

can this be closed?

nicolevasilevsky commented 4 years ago

The semantic workshop queries are published here: https://zenodo.org/record/3611647#.XnweeJNKiL8

cancerDHC / operations

1a4: Develop competency questions #13