Open ppavlidis opened 3 months ago
We need to handle negative subqueries in the backend for this kind of feature to work.
Right now, for a filter that involve a subquery like (i.e. characteristics.category = disease
), we do something like:
select * from datasets where id in (select id from datasets join datasets.characteristics c where c.category = 'disease')
Using a negative clause (i.e. characteristics.category != disease
) would result in:
select * from datasets where id in (select id from datasets join datasets.characteristics c where c.category <> 'disease')
But that's not what is being expected: it will return datasets that have at least one characteristic that does not have the URI in question. Instead, we should do something like:
select * from datasets where id not in (select id from datasets join datasets.characteristics c where c.category = 'disease')
For the syntax, I'm thinking of something like:
none(characteristics.category = disease) |
no characteristic match the predicate |
any(characteristics.category = disease) |
at least one characteristic match the predicate (current behavior) |
all(characteristics.category = disease) |
all characteristics match the predicate |
I would start only with any
and none
. The all
can be obtained by negating the filter with none
, but we do not support negating the in
operator. This will require https://github.com/PavlidisLab/Gemma/issues/1192.
The backend now supports quantifiers with the syntax above. It's now possible to implement the requested feature.
I'm not sure how we would present this to users, we've discussed handling negation better before, but it came up as a use case:
Find all experiments that don't involve a disease i.e. lack a disease annotation.
or
Find data sets that have control samples for a Disease factor
We usually hide annotations like "Reference role" from the browser and I think that is still a good call.
That kind of thing might make sense if you want "normal" samples / data sets, by some definition of "normal".