PavlidisLab / GemmaDE

Discover biological conditions associated with differential gene expression by compiling information from >10,000 published experiments
MIT License
4 stars 0 forks source link

Avoid grouping of experiments by EFO term "experimental process" #19

Open ppavlidis opened 2 years ago

ppavlidis commented 2 years ago

This isn't truly an issue in GemmaDE per se but something less than ideal that happens through ontology inference in EFO.

A query I tried brought up this result

image

https://gemma.msl.ubc.ca/expressionExperiment/showExpressionExperiment.html?id=7993 https://gemma.msl.ubc.ca/expressionExperiment/showExpressionExperiment.html?id=15085

The two data sets have nothing to do with each other. It's just that the annotated terms "coronary artery bypass" and "shear stressing" are both children of "experimental process".

This is a part of the EFO that is not very well developed IMO: it covers a limited and seemingly random set of "experimental processes" in a shallow DAG. So being children of this term doesn't mean a lot. (like why isn't "coronary artery bypass" a child of "therapeutic procedure"? In what sense is "coronary artery bypass" a "experimental process"?).

There may be other examples like this, so possibly we might need to create a list of ontology terms that are "too high level" to be useful for inference. There may be a way to infer a list of such terms based on low information content but however we identify them they would be in a blacklist of terms that GemmaDE wouldn't use for analyzing enrichment.

k8iechen commented 2 years ago