PavlidisLab / Gemma

Genomics data re-analysis
Apache License 2.0
22 stars 6 forks source link

Support filtering datasets by both value and category URIs #750

Open arteymix opened 1 year ago

arteymix commented 1 year ago

We need to add support for specifying both value and category because certain term fall within multiple categories (see https://github.com/PavlidisLab/GemmaCuration/issues/355). This is needed by GemBrow for correctness.

I see two options at this time for implementing this:

arteymix commented 1 year ago

We're currently using a workaround in GemBrow by requiring a dataset to have a factor with a certain category and a factor with a certain value.

This works as long as there are no factor with overlapping terms or categories attached to a given dataset. For example:

"disease" and "Alzheimer's"

Would incorrectly match a dataset with the following terms:

The workaround was proposed by @oganm.

arteymix commented 1 year ago

A more general approach would be to add support for conjunctions within subclauses. It would play well with subqueries. This would be a pretty significant change to the filtering logic though.

arteymix commented 11 months ago

This is basically the same issue we face in #885. This could be a quite involving feature because it will affect how our subqueries are generated.