fangwuwang / team_Bloodies

0 stars 2 forks source link

Initial feedback #1

Open singha53-zz opened 7 years ago

singha53-zz commented 7 years ago
Name Department/Program Experties/Interests GitHub ID
Annie Cavalla Bioinformatics Cancer genomics, single cell transcriptomics @acavalla
Rawnak Hoque Genome Science and Technology Genome scale data analysis @rawnakhoque
Somdeb Paul Genome Science and Technology Transcriptomics @psomdeb25
Fangwu Wang Medical Genetics Stem cell biology, Epigenomics @fangwuwang

Team name: Bloodies

Project summary: Our project is interested in how hematopoietic stem cells, a rare stem cell population able to regenerate all erythroid, myeloid and lymphoid lineages in humans, make cell fate decisions during multiple-stage differentiation. We will obtain RNA-seq, DNA Methylome and ChIP-seq data from a European public resource (BLUEPRINT: http://dcc.blueprint-epigenome.eu/#/home). We will study the transcription factors (TFs) specifically expressed in one cell type (transcriptional signature) and the epigenomic features of different cells.

The preliminary plan includes: 1) Using RNA-seq data to find signature genes and generate a list of essential TFs for the development of each cell type; 2) Using methylome and ChIP-seq of histone marks to identify active/poised promoter and enhancer regions, and recognize the TF binding motif within these regions to infer the important transcriptional regulation during differentiation; 3) Correlating these promoter/enhancer regions from (2) with gene expression to verify the transcriptional regulation; 4) Constructing a TF network for lineage differentiation using these datasets and known TF interactions from literature. There will be comparisons and statistical analyses involved in each step.

singha53-zz commented 7 years ago

Some comments:

1. The rationale is unclear to me: you state, "We will study the transcription factors (TFs) specifically expressed in one cell type (transcriptional signature) and the epigenomic features of different cells."

2. The preliminary plan includes:

Using RNA-seq data to find signature genes and generate a list of essential TFs for the development of each cell type;

Using methylome and ChIP-seq of histone marks to identify active/poised promoter and enhancer regions, and recognize the TF binding motif within these regions to infer the important transcriptional regulation during differentiation;

Correlating these promoter/enhancer regions from (2) with gene expression to verify the transcriptional regulation;

Constructing a TF network for lineage differentiation using these datasets and known TF interactions from literature.

Remember to add a table stating the division of labour (e.g. which individuals are involved in the study design, obtaining data, preprocessing data, cleaning data, QC of data, exploratory data analysis, statistical analyses, writing etc).

Happy to discuss further at Wednesday's seminar. Cheers

@santina

singha53-zz commented 7 years ago

hi @STAT540-UBC/team-bloodies @santina

I cannot find the issue where you submitted your proposal. Can you please point me to it, so that I can upload my comments to your proposal?

santina commented 7 years ago

I digged and found it: https://github.com/STAT540-UBC/team_Bloodies/blob/master/Data/Proposal/Proposal.Rmd

singha53-zz commented 7 years ago

Okies. I'll add the feedback here: @STAT540-UBC/team-bloodies

Background/Motivation:

Division of labour:

Datasets:

Aim and Methods:

Aim 1: I am still unclear how the active DNA methylation regions are identified? Does it compare the beta/M-values for a given cell-types against all others using a statistical test? I am assuming hypomethylated is the same as active DNA methylation region since that gene is able to be transcribed?

Aim 2: again which statistical test will you use? Limma, DESeq2? What FDR threshold will you use?

Aim 3: This analyses is based on what data, methylation or RNA-Seq? This is confusing to me, probably because I am unaware of how exactly this is performed. What statistical analysis method will you use to determine significant enrichment?

Aim 4: are the TFs you will test based on Aim 3 or do you have a list of TFs? Note sure how an ANOVA fits here. You can use an ANOVA to determine which TFs (based on methylation/gene expression) are different in at least one of cell-types, but how would you determine if a TFs with lower methylation has high gene expression? Is there a specific program that is used to identify TF binding sites using Ensemble regulatory build?

Aim 5: Seems like there are two aims here: one for a differential expression analysis between AML and CLL and another where you cluster these subjects based on TFs you identified in your previous aims. you could even correlate the clusters of the subjects (label you come up with using cluster analysis) with the correct labels of AML and CLL or other clinical variables for these subjects.

Looks great otherwise. Well Done! Happy to discuss further :)

santina commented 7 years ago

I realized I never commented on this but since I'm gonna comment on the progress report, here are just a few things:

Therefore, we will also focus on enhancer regions ...

Associated genes for enhancers are hard to identify since they could be very far apart. Here are two resources you can look into:

Pairwise differential expression analysis to identify upregulated genes ....

Why pairwise? Remember that in linear model you can do DEA on all different groups at once. Why only upregulated genes? Some genes could be downregulated during some cell differentiation.