PavlidisLab / Gemma

Genomics data re-analysis
Apache License 2.0
23 stars 6 forks source link

Introduce QT field to indicate the expression level information is missing (across-genes) #636

Open ppavlidis opened 1 year ago

ppavlidis commented 1 year ago

This isn't super-important, but it always pains me for experiments we have curated to muck up the works for what is mostly not a big problem.

Examples:

The data in GEO has undergone some kind of gene-wise normalization. So each gene has a similar mean. The difference in the mean of two genes is no longer a reliable indicator of relative expression levels.

This primarily affects a subset of GEO submissions using Illumina BeadArrays (we never have raw data) and various microarrays (used in one-channel mode) for which we lack or don't use raw data (if it is even present) , and there is no alternative unnormalized QT provided by the submitter within the SOFT file, and the submitter didn't really know what they were doing (e.g. used canned software like GeneSpring that had a habit of doing this).

It is not terribly common in Gemma, but we have blacklisted some studies for this reason in the past and have generally avoided them.

It evidences itself in a couple ways: (with some variation)

As far as I can tell the impact of this is actually fairly minor, because within a gene things are okay. The differential expression analysis is fine.

But it does have some impact:

To avoid having to blacklist these experiments, but to also limit confusion, having a field in the QT to indicate something like "gene-centered" would alert curators that they shouldn't be bothered by the way the data looks, that the GEEQ score could take a hit (though the poor sample-sample correlations probably takes care of that) and the QT checker can chill.

If "gene-centered" is too specific could consider instead a more generic way to flag that the QT is "non-standard" (by our standards).

ppavlidis commented 1 year ago

https://gemma.msl.ubc.ca/expressionExperiment/showExpressionExperiment.html?shortName=GSE41496