cBioPortal / datahub

A centralized location for storing curated data from cBioPortal
172 stars 120 forks source link

methylation data questions #1210

Open tmazor opened 4 years ago

tmazor commented 4 years ago

It sounds like we are currently using raw beta values from the array probe that is most negatively correlated with expression.

For comparison, the mRNA expression data that we provide has gone through some normalization/other processing to get to the gene level expression data that we show.

Methylation array data should also be background corrected & normalized. I think this is especially important as we enable users to analyze this data in new ways, e.g. with group comparison.

I know we don't generally process data ourselves, but presumably there was some processing of the methylation array data as part of any study that has it? e.g PanCanAtlas must have processed the methylation data in some way before using it. Can we get that post-processing data?

In addition, if I remember correctly, there are statistical implications to the fact that beta values are bounded between 0 & 1. When I performed differential methylation analysis, we first transformed the beta values to M values and then did the differential analysis with the M values. It's been a while since I did any of this kind of analysis and I'm definitely not current on best practices, but this may be something we need to consider for the group comparison feature.

jjgao commented 4 years ago

@cBioPortal/curation @yichaoS @schultzn @ao508 any comments?

jjgao commented 4 years ago

Just to add an example and see if we can find a better metric for address it.

In this query: , the BRCA1 hypermethylation is obviously enriched in the unaltered group (BRCA1 WT). If we have hypermethylation calls, we will easily capture that as significant event, but with the current data, it was not significant.

image

schultzn commented 4 years ago

I am not sure the methylation beta-values have ever gone additional normalization in any of the TCGA projects...

And I also noticed that BRCA1 was not significant, JJ. But anyone who visually expected the genes would notice the outliers - so maybe that is what we would need, an outlier analysis: test whether any of the two groups is enriched for outliers.

jjgao commented 4 years ago

@schultzn that's a good observation. I guess an outlier analysis is almost like calling hyper/hypo-methylation? We'll need some statistics help to call outliers for highly skewed data (non-normal like) data. Do you know any progress was made to call hyper/hypo-methylation?

The problem our current t-tests is that the distribution of beta values is not normal like, M values seem a little better: https://pubmed.ncbi.nlm.nih.gov/21118553/#&gid=article-figures&pid=figure-2-uid-1. There are some more discussion about statistics methods for analyzing methylation data: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497424/

yichaoS commented 4 years ago

@jjgao If we are considering other statistical methods other than group comparison, should we proceed with curating the beta values (it is in the process) still? Or we should try to find other data sources for normalized methylation data? cc @tmazor

jjgao commented 4 years ago

I am reading a bit more about beta value vs M value: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-587. I think we should create another M value profile and use that one as the default one for comparison analysis. We can still keep the beta value for users to choose.

The M values can be calculated from beta values: image

@tmazor @schultzn should we move forward?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tmazor commented 3 years ago

I think adding the M value as a second profile makes a lot of sense

ritikakundra commented 3 years ago

That sounds good as @rmadupuri just curated methylation profile with M value data but we could not add it in.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.