Initial feedback - Githubissues

singha53-zz commented 7 years ago

Name	Department/Program	Experties/Interests	GitHub ID
Annie Cavalla	Bioinformatics	Cancer genomics, single cell transcriptomics	@acavalla
Rawnak Hoque	Genome Science and Technology	Genome scale data analysis	@rawnakhoque
Somdeb Paul	Genome Science and Technology	Transcriptomics	@psomdeb25
Fangwu Wang	Medical Genetics	Stem cell biology, Epigenomics	@fangwuwang

Team name: Bloodies

Project summary: Our project is interested in how hematopoietic stem cells, a rare stem cell population able to regenerate all erythroid, myeloid and lymphoid lineages in humans, make cell fate decisions during multiple-stage differentiation. We will obtain RNA-seq, DNA Methylome and ChIP-seq data from a European public resource (BLUEPRINT: http://dcc.blueprint-epigenome.eu/#/home). We will study the transcription factors (TFs) specifically expressed in one cell type (transcriptional signature) and the epigenomic features of different cells.

The preliminary plan includes: 1) Using RNA-seq data to find signature genes and generate a list of essential TFs for the development of each cell type; 2) Using methylome and ChIP-seq of histone marks to identify active/poised promoter and enhancer regions, and recognize the TF binding motif within these regions to infer the important transcriptional regulation during differentiation; 3) Correlating these promoter/enhancer regions from (2) with gene expression to verify the transcriptional regulation; 4) Constructing a TF network for lineage differentiation using these datasets and known TF interactions from literature. There will be comparisons and statistical analyses involved in each step.

singha53-zz commented 7 years ago

Some comments:

1. The rationale is unclear to me: you state, "We will study the transcription factors (TFs) specifically expressed in one cell type (transcriptional signature) and the epigenomic features of different cells."

do you want to associate TF gene expression in the parent hematopoietic stem cells with methylation of the differentiated cell-types? please clarify...
how will you study the TF expression in one-cell-type (compare its expression profiles with samples from other cell-types? clarify)
Also please state, the numbers of samples available for each cell-type(s), omic data, etc. Keep in mind depending on the number/size of the RNA-Seq fastq files, you may need access to a cluster in order to preprocess (QC, align, map, summarise to counts) RNA-Seq files.
also state if previous studies have tried to address the question already. What is the knowledge gap your are trying to address?

2. The preliminary plan includes:

Using RNA-seq data to find signature genes and generate a list of essential TFs for the development of each cell type;

what do you mean by signature? differentially expressed genes? biomarker panel? If you are only interested in TFs are you only going to look at a a priori list of TFs? (where will you obtain that list?)

Using methylome and ChIP-seq of histone marks to identify active/poised promoter and enhancer regions, and recognize the TF binding motif within these regions to infer the important transcriptional regulation during differentiation;

how will you identify "active/poised promoter and enhancer regions" --> what does active mean (e.g. % change in methylation?) --> any statistical methodology/comparison you will carry out?

Correlating these promoter/enhancer regions from (2) with gene expression to verify the transcriptional regulation;

state statistical methodology (e.g. linear model with specific parameterization to answer your specific question of interest).

Constructing a TF network for lineage differentiation using these datasets and known TF interactions from literature.

which method will you use? which database will you retrieve these TF interactions from? What software will you use? libraries?

Remember to add a table stating the division of labour (e.g. which individuals are involved in the study design, obtaining data, preprocessing data, cleaning data, QC of data, exploratory data analysis, statistical analyses, writing etc).

Happy to discuss further at Wednesday's seminar. Cheers

@santina

singha53-zz commented 7 years ago

hi @STAT540-UBC/team-bloodies @santina

I cannot find the issue where you submitted your proposal. Can you please point me to it, so that I can upload my comments to your proposal?

santina commented 7 years ago

I digged and found it: https://github.com/STAT540-UBC/team_Bloodies/blob/master/Data/Proposal/Proposal.Rmd

singha53-zz commented 7 years ago

Okies. I'll add the feedback here: @STAT540-UBC/team-bloodies

Background/Motivation:

You hypothesis may need revising: "TFs binding to the active epigenomic regions" --> you only have gene expression and methylation data, so I don't think you can make claims about TF binding (you might need Chip-Seq data for that). how about something along the lines of gene expression of TFs or methylation of TF genes?

Division of labour:

looks good to me. Although it would be nice to see some QC and exploratory data analyses since we covered that in the course and it is a very important component, since it will affect your downstream analyses.

Datasets:

please provide the numbers of samples you have for each cell-type. All I can gather is that you many replicates (biological and technical) for DNA methylation compared to the RNA-Seq, but I am unclear as to how many you have of each.

Aim and Methods:

Aim 1: I am still unclear how the active DNA methylation regions are identified? Does it compare the beta/M-values for a given cell-types against all others using a statistical test? I am assuming hypomethylated is the same as active DNA methylation region since that gene is able to be transcribed?

Aim 2: again which statistical test will you use? Limma, DESeq2? What FDR threshold will you use?

Aim 3: This analyses is based on what data, methylation or RNA-Seq? This is confusing to me, probably because I am unaware of how exactly this is performed. What statistical analysis method will you use to determine significant enrichment?

Aim 4: are the TFs you will test based on Aim 3 or do you have a list of TFs? Note sure how an ANOVA fits here. You can use an ANOVA to determine which TFs (based on methylation/gene expression) are different in at least one of cell-types, but how would you determine if a TFs with lower methylation has high gene expression? Is there a specific program that is used to identify TF binding sites using Ensemble regulatory build?

Aim 5: Seems like there are two aims here: one for a differential expression analysis between AML and CLL and another where you cluster these subjects based on TFs you identified in your previous aims. you could even correlate the clusters of the subjects (label you come up with using cluster analysis) with the correct labels of AML and CLL or other clinical variables for these subjects.

Looks great otherwise. Well Done! Happy to discuss further :)

santina commented 7 years ago

I realized I never commented on this but since I'm gonna comment on the progress report, here are just a few things:

Therefore, we will also focus on enhancer regions ...

Associated genes for enhancers are hard to identify since they could be very far apart. Here are two resources you can look into:

Pairwise differential expression analysis to identify upregulated genes ....

Why pairwise? Remember that in linear model you can do DEA on all different groups at once. Why only upregulated genes? Some genes could be downregulated during some cell differentiation.

fangwuwang / team_Bloodies

Initial feedback #1

Background/Motivation:

Division of labour:

Datasets:

Aim and Methods: