Closed reisingerf closed 8 months ago
@reisingerf This is IMO one of the areas where we need an addition to the current behaviour which not necessarily means that we have to change the protocol itself. I can come up easily with 4 scenarios:
For me no. 2. and 3. look more attractive than 1. In fact, we already follow 3. in bycon
/ Progenetix: Terms which are applied to the same parameter are automatically treated as OR, which allows
one of 3 diseases (which are not hierarchically related)
This could be documented as default if we could agree in the wider Beacon community and should be rather easy to support server side.
Option 3. (e.g. doing something like NCIT:C8383|NCIT:C9003
or with something like "$or": [...]
in POST) has potential but requires changes, as would do a more extensive version (1.). I'd personally need to get strong arguments since this might run into problems with adoption/implementation which are hard to foretell. Option 4. is probably easy (we actually offer this through bycon
/ Progenetix ..., too) but I'm myself not convinced of its need (and yes, one always can aways dream up a specific case...).
Summary: I would love to see a call for / discussion of these (and other) scenarios with a my personal support at the start going to 3.
@tb143 @redmitry @costero-e
Hi all,
Current specification e.g. individuals endpoint uses the notation filters=A&filters=B&filters=C
(spec)
"The default serialization method is style: form and explode: true"
The reference implementation supports the notation filters=A,B,C
not sure about the "exploded" one.
Note that notation filters=A,B,C
is out of scope of the HTTP parameters spec. Nevertheless, it is defined as the parameter pattern in the FRC6570 {?filters}
. Technically, the pattern '{?filters}{&filters*}' will result in the query
?filters=A,B,C&filters=D&filters=E
as [A or B or C] and D and E.
[edit]
To do not break current implementations, pipe for "OR" is a good solution IMO
Regarding the POST queries, we can extend FilteringTerm with "either": "boolean"
property, where "either": true
would mean comma
or 'OR'.
'filters: [{"id": "A"}, {"id": "B", "either": true}, {"id": "C", "either": true}, {"id": "D"}]'
This way we won't break too much the spec...
Best,
Dmitry
We have documented comma-concatenation for parameters as
Since the direct interpretation of list parameters in queries is not supported by
some server environments (e.g. PHP, GO…), list parameters such as `start` and `end`
should be provided as **comma-concatenated** strings when using them in GET requests.
There is a reason in that some implementations do not resolve multiple instances of the same parameter into a list but only keep one value; and that comma-concatenation (or ;
) is a common format for list parameters in GET. We actually allow both (i.e. first concatenate all list parameter instances and then split). I don't think there is any reference for using commas as expressing OR logic. You example above
?filters=A,B,C&filters=D&filters=E
would be interpreted by bycon
as
{$and: [ {par1: A}, {par2: B}, {par3: C}, {par4: D}, {par5: E} ] }
... which is IMO correct (though as mentioned above we would use OR internally for terms hitting the same par which is not yet per spec).
But I see something similar as an option to define OR using another character than ,
or ;
(e.g. |
=> %7C
):
?filters=A|B|C&filters=D&filters=E
or
?filters=A|B|C,D,E
... which would correspond to my number 2. above.
Still, before introducing new syntax I'd be very much in favour for solving the most common OR example by documenting the expected behaviour (i.e.
Filter values against the same parameter in the model are expected to be treated as OR
... since AND would lead to failing queries if values not in same expansion tree).
?filters=A,B,C&filters=D&filters=E
would be interpreted by bycon as '{$and: [ {par1: A}, {par2: B}, {par3: C}, {par4: D}, {par5: E} ] }'
The same for the current BSC implementation (just unroll A,B,C)
But I see something similar as an option to define OR using another character than , or ; (e.g. | => %7C):
?filters=A|B|C&filters=D&filters=E
or?filters=A|B|C,D,E
I am OK with this convention, unfortunately no way to express it in the OpenAPI ;:-(
In the OpenAPI it is either ?filters=A&filters=B
or ?fileters=A,B
or ?filters=A|B
So we probably should not define filters as an array (it would be one "filters" parameters with a query we parse).
@redmitry and @mbaudis I have been thinking about a solution with filter groups. For example
{
...
"query": {
...
"filters": [
// filter group 1 (treated as OR amongst them)
[
{
"id": "MONDO:0004126”
},
// OR
{
“id”: “MONDO:0004979”
}
],
// filter group 2 (treated as OR amongst them, AND between the groups)
// AND
[
{
"id": "NCIT:C20197"
}
]
],
...
}
}
This would be easier for parsing/validating. Filter groups are AND
ed and filters inside them are OR
ed.
This, unfortunately, is not spec compliant but can be supported alongside the specification. Detection of filter groups should not be difficult. This may be translated in GETs as follows too;
?filters=MONDO:0004126|MONDO:0004979,NCIT:C20197
This, unfortunately, is not spec compliant but can be supported alongside the specification.
My concern is how many compatible implementations will fail parsing this. I may be biased, but I would prefer just to add a logic property instead of grouping.
Compatibility: Yes, and therefore I go back to my suggestion to document a practical & logical use scenario as:
While the Beacon v2 framework defines the general chaining of multiple filter values with an AND logic, filters applied against the same parameter are expected to be treated as OR[^1]
This clarification then serves the discussed scenarios and could - by agreement - directly find its way into the documentation - anybody want to set up a poll? However, since we can always invent more complex scenarios I think a design for filter grouping would be an option for another level of discussion. There are other outstanding issues - e.g. a general excluded/negated flag so this is something for v2.n (but ASAP).
[^1]: AND chained filters against the same parameter will result in a null
or erroneous result:
null
resultnull
value if includeDescendantTerms: False
)Hi everybody, I hope I'm not too late to the conversation.
I think we can make a poll on this question as @mbaudis suggested, I will start it later today.
In my opinion, think I would implement the either
parameter for filters request (as @redmitry suggested) more than specifying that the first filters
parameter to refer to AND and the second filters
parameter to an OR. I think it would be a bit confusing to have several "filters" parameters with a specific meaning. It would make things more difficult to develop and also for users to understand its application (it's less intuitive than using an "either" parameter).
On the other hand, I think @anuradhawick proposal on the GET ?filters=MONDO:0004126|MONDO:0004979,NCIT:C20197
would be clear and aligned with this "either" parameter on the POST request, so I would go for this option, too.
I'm just not clear how the POST would look - like this?
"filters": [
{ "id": "NCIT:C20197"},
{ $or: [
{ "id": "MONDO:0004126"},
{ "id": "MONDO:0004979"}
]
]
... with the root boolean being the AND (which could be modified through a global directive - separate issue).
The MongoDB syntax here (used for modelling. YMMV) could also be rewritten (though slightly different meaning - i.e. you have one filter w/ an alternative id
parameter instead of alternative filters):
"filters": [
{ "id": "NCIT:C20197"},
{ "id":
{
$in: [
"MONDO:0004126",
"MONDO:0004979"
]
}
}
]
The approach with $in
seems to carry great potential as the spec could eventually $not-in
or even more complex queries ($all
, $starts-with
).
I wish we'll steer well away from the mongo syntax or flavour. Because beacon spec could allow custom operators to be used in this place which could cause confusion with mongo syntax.
Like in
it would be great to be able define operators inside the beacon. Users may be able to fetch them from an endpoint like operators
.
Answering the original question from @reisingerf: The spec only supports AND operators. If any implementation, reference one or another, does otherwise, it is just an extension ( that I would not recommend at this point of Beacon adoption, when strickly sticking to the spec would bring clarity to the community ) Having said so, during the v2 design phase we decided to leave the "OR" discussion for later (after v2.0, I mean), as we would like to gather feedback on what implementers prefer to do. I believe that "us" (in here) could not be representative of the "average implementer", but it is very good to trigger that conversation and include all the aspects: compatibility with standards (REST, HTTP...) and easy of implementation, i.e. how servers manage the suggested syntax for "OR" expressions.
My 2 cents
About which syntax to use, I would advocate to what HTTP spec says (if anything). Alternatively, something popular like OpenAPI could be the reference. Again, remember that we tried to avoid complex queries with groupings, etc. as we could end up asking the Beacon implementers to develop a query parser and a query engine.
as we could end up asking the Beacon implementers to develop a query parser and a query engine
But ... that is a given? The moment you use filters you need to be able to apply filters to different entry types which basically implies aggregation of queries / map-reduce ...
Sorry to chime in, I haven't been working on this so please take it as an outsider's view:
I'd personally stay away from complex query/search options. The beacon was initially and still is a look-up service, a quick way to find out if there's something potentially interesting out there. It's not too complex and does not require a huge and complex infrastructure.
As soon as you expand into a complex search you'd end up with tons off issues/questions:
I don't think you could address the needs of all clients anyway, you'd always play catch-up with new requirements/use cases. It's also the clients that knows best what they want.
To me it's more important that a client can get to the data consistently across many implementations/beacons, rather than running complex queries. If I wanted the complexity I'd probably get all the data and do it locally anyway.
Mind you, there may be a middle way to cater for most (common) use cases, but...
In the end, would more complex queries really help with beacon adoption?
Totally agree
After yesterday's Beacon meeting & general agreement about this I've updated the docs to emphasize the "currently always AND (but logically process OR for terms against same parameter, w/o special syntax)".
Updated documentation:
We can implement "plain" logic following the simple rule:
The java implementation now have this scheme implemented:
"filters": [
{
"id": "NCIT:C20197"
},
{
"id": "Weight",
"operator": ">",
"value": "80"
},
{
"id": "BMI",
"operator": ">",
"value": "25",
"logic": "or"
}
]
"NCIT:C20197" "Weight>100" |"BMI>30" => "NCIT:C20197" & ("Weight>100" | "BMI>30") The GET equivalent considers comma separated filters as "OR":
[https://] beacons.bsc.es/beacon/v2.0.0/individuals?filters=NCIT:C20197&filters=Weight>100,BMI>30
Best,
Dmitry
@redmitry The simplification offered does not solve the need of the logical condition grouping. Below are the examples of "simplified logic" applied (using your Python encoder): statement_1 = 'A & ( B | C ) | D' print(encode(statement_1)) => #output:A B |C |D statement_2 = 'A & ( B | C | D )' print(encode(statement_2)) => #output:A B |C |D flat_statement = 'A B |C |D' print(decode(flat_statement)) => #output:A & ( B | C | D ) Notice, that statements 1 and 2 parsed into the same expression (which is incorrect). And a flat statement gets an unwanted grouping.
If the flat form is critical for the logical expression, the Polish notation should be used to encode logic and decode it on the backend.
@zykonda
statement_1 = 'A & ( B | C ) | D'
equals '(A & ( B | C )) | D' and is unresolvable under the proposed 'flat' rules. I modified the python script to do not start automatically the group on the first param. Nevertheless, 'A & ( B | C ) | D' may be rewritten as 'D | A & ( B | C )' eq. D | (A & ( B | C )) '&D |A B |C' provides desired logic.
That's true, that proposed "flat" logic may not provide complex grouping.
Kind regards,
D.
@redmitry and @mbaudis we implemented something along the lines you are discussing here. This is implemented for searching variants and associated datasets. But I have a strong feeling the approach might contribute here as well. We created a mutation format that is as follows.
Example query: (!A23403G:(C150156G:T567765G):!C789T)|!C150156G
Essentially, !
translates to negation, :
to AND operator and |
to OR operator. Brackets perform grouping.
Our original implementation could handle single queries like A23403G
and whenever we have a complex query as above, we parse it (see the code below) and get an expression as follows (!0&(1&2)&!3)|!4
, each number corresponds to index of the variant query. We send this over to API as an extra attribute queryCombination
with the list of variant queries indicated by each index. This is our query parser from https://github.com/aehrc/PathsBeacon/blob/dev/modules/website/beaconApp/src/app/components/main/main.component.ts#L336
parseMutationExpression(expression: string): [string, any[]] {
const _expression = expression.replace(/\s/g, "");
const varRegex =
/^((?<refName>[0-9a-zA-Z]+)-)?(?<refBases>[a-z]+)(?<start>\d+)(?<altBases>[a-z]+)/i;
const logicRegex = /[!&:()|]/;
const queries: any[] = [];
let combination: string = "";
let index: number = 0;
for (let pos = 0; pos < _expression.length; pos++) {
const char = _expression[pos];
if (char.match(logicRegex)) {
// consume the groupings (parenthesis) and logic (!|)
combination += char == ":" ? "&" : char;
} else {
// consume the mutation
const remainder = _expression.slice(pos);
const matches: any = remainder.match(varRegex);
// must have a match starting here else skip
if (matches) {
queries.push(matches.groups);
combination += index.toString();
// jump over the match and increment index
pos += matches[0].length - 1;
index += 1;
}
}
}
return [combination, queries];
}
Our approach is backwards compatible and does not break the existing workflow. Only shift gears with the extra queryCombination
attribute when needed. This is actually done by our GUI hosted at https://github.com/aehrc/PathsBeacon.
For the metadata scenario, we can use a similar approach. We do not change the filters in GET
nor POST
requests. Requests would look like below
api/../..?filters=SNOMED:1234,HP:1234
We just have to add an extra query param as follows
api/../..?filters=SNOMED:1234,HP:1234&combination=1:2
Note we can URL encode to include these combination characters like &
or |
.
Similarly for POST
requests we can use a queryCombination
like field to specify how the filters must be combined to form the logic. This way we do not make any breaking changes to the schema of filters attribute; it remains a plain array easy to read and parse.
This is a slight change in protocol but should add immense flexibility for the community. My colleague and lead dev in this implementation @bhosking, can chime in for extended details if needed.
Very happy to hear your thoughts.
Cheers, Anuradha
Hi all, Proposed "flat" logic has its limitations as it supports only "and" groupings and relies on standard logic operator precedence ("and" then "or") for "ors" (e.g. "A & B | C & D" = "(A & B) | (C & D)" ). It has the limitation of not permitting complex grouping with nested brackets.
Essentially, ! translates to negation, : to AND operator and | to OR operator. Brackets perform grouping.
Introducing negation would allow standard logic transformation rules and removal of nested brackets in expense of less readable logic.
@anuradhawick
In my implementation, I actually transform the "flat" logic into the "standard" one before processing:
https://docs.google.com/presentation/d/12cKU3eDTWq1lkkmp101mGxMon1MdRhYxzybr2wsN7FE/edit#slide=id.g2b86397cbd2_3_105
So I get something like (!0&(1&2)&!3)|!4
with only difference that I use 'F' for all parameters (since we already have an ordered list of filters).
As I mentioned in the presentation:
Technically, both forms may be considered for as the extension for the Beacon 2.0 without breaking current spec.
My point is that the "flat" logic is an additional step that could be hard to use and we can always opt for the additional property like "filtering_query".
Cheers,
Dmitry
@anuradhawick , all (This is a bit OT for the specific filters topic but since variants were used in the example & the general principle applies): I just want to point out that for variant queries IMO we will move to "typed" queries (genomicSequenceQuery
, genomicRangeQuery
...) so each query will be evaluated of its correct composition separately & then one can chain the queries themselves.
@redmitry Per-filter negation is OMO a given & there is an issue somewhere (also aligning this w/ Phenopackets).
Also, since all variant query parameters can be provided as GET or POST, and not with any defined order, they should be evaluated separately for correct form & then checked against the (implicit or better future explicit) query type.
&variantQueryGroups=(genomicBracketQuery=(referenceName=refseq:NC_000009.12,start=21000001-21975098,end=21967753-23000000,variantType=EFO:0030067)&&(genomicSequenceQuery=(referenceName=refseq:NC_000009.12,start=21675032,alternateBases=A))
... without checking character compatibility :-) Easier to write in JSON, with a bit more complex example (here matching cases which have either a given protein sequence variant in CDKN2A OR a combination of a CNV DEL overlapping the gene together with a specific sequence alteration in the remaining Allele - I made this up, partially...):
"g_variants": {
"variantQueryGroups": {
"logic": "OR",
"groups": [
{
"logic": "AND",
"variantQueries": [
{
"genomicBracketQuery": {
"referenceName": "refseq:NC_000009.12",
"start": [21000001, 21975098],
"end": [21967753, 23000000],
"variantMinLength": 1000,
"variantType": "EFO:0030067"
}
},
{
"genomicSequenceQuery": {
"referenceName": "refseq:NC_000009.12",
"start": 21675032,
"alternateBases": "A"
}
}
]
},
{
"variantQueries": [
{
"aminoacidAlterationQuery": {
"geneId": "CDKN2A",
"aminoacidAlteration": "D153Y"
}
}
]
}
]
}
}
So, using this I just want to caution against very specific implementations ... The above example is also designed to allow the legacy parameters directly in the g_variants
object.
@redmitry I have some concerns about the following from your slides.
The implementation first converts the “plain” form into the standard logic template: “A B |C” => “A & (B | C)” Filters names are irrelevant as we have “filters” array. Next step is collapsing brackets into sequential form: “A & (B | C)” => “A & D” In this steps a list of matching queries is generated ($match A, $or(B,C))
The conversion - “A B |C” => “A & (B | C)”
seems a bit confusing. Why would it not be (A & B ) | C
? , The usual order is NOT
, AND
then finally OR
. Why not assume A & B
then confusion vanishes.
This could be complex if we had A B | C | D
would this be A & (B | C) | D
or A & (B | C | D)
?
May be what @mbaudis has proposed with query groups will solve this problem?
@redmitry I have some concerns about the following from your slides.
Current procedure for filters is "AND". The main idea was introducing implicit grouping:
"A B C D" => "(A) & (B) & (C) & (D)" => "A & B & C & D"
The definition "logic" in the filter just continues the group
The conversion -
“A B |C” => “A & (B | C)”
seems a bit confusing. Why would it not be(A & B ) | C
? ,
Because B
has no "logic" it 1. default "AND"; 2. starting a group (parenthesis).
C
has explicit logic so it doesn't create a group.
The usual order is
NOT
,AND
then finallyOR
. Why not assumeA & B
then confusion vanishes.
The problem is that I want to be able to provide groping like:
( A & B ) | ( C & D )
which is default logical behavior without brackets ( == A & B | C & D
).
( A | B ) & ( C | D )
needs grouping, so according proposed "rules" is A |B C |D
A
has no explicit logic so start grouping
& ( A
B
has explicit logic ("OR") so continue the group (if any).
& ( A | B
3, C
has no logic, so 1, default "AND"; 2 start a new group (and close previous one).
& ( A | B ) & ( C
& ( A | B ) & ( C | D
5, END we still have an open group - close it!
& ( A | B ) & ( C | D )
The first logic is just stripped and actually have no sense.This could be complex if we had
A B | C | D
would this beA & (B | C) | D
orA & (B | C | D)
?
Following the "rules" A & ( B | C | D )
- as for D
explicit logic is always continue the group.
As I already said this "rules" of grouping may not provide complex grouping,
A & (B | C) | D
may not be implemented "as is", but &D |A B |C
== D | A & ( B | C )
Since I got many questions about the proposed "logic" even from technical people, I am afraid it is hardly applicable for "normal" people... I am biasing towards adding a common "filters_logic" field like F1 & ( F2 | F3) | F4
.
Cheers,
D.
@redmitry
Since I got many questions about the proposed "logic" even from technical people, I am afraid it is hardly applicable for "normal" people...
I agree that the need to "translate" the logic statement in a specific form can potentially confuse API users and lead to unexpected results. Plus, I'm concerned about limited applicability.
@anuradhawick Your solution/proposal is more clear for users, though I would call the new parameter queryExpression (vs queryCombination). I have a question: how do you get around the use of non-standard URL encoding for GET operation?
The proposal from @mbaudis resembles DNF: (group of expressions joined by AND) OR (group of expressions joined by AND), from our proposal to use DNF or CNF.
@zykonda for GET requests, we must to use URL encoding (usually handled by web frameworks to ensure safety, etc). This should not be huge problem as POSTMAN, etc support this out of box. At the moment though, we use POST requests from our GUI. For a GET request, the query expression (A&B|C)&D
the URL would be.
{{API}}/?queryExpression=(A%26B|C)%26D&filters=A,B,C,D
&
is the challenge, which is encoded to %26
Hi all, a quick question about supported/expected filtering logic, especially for ontology terms:
Reading the documentation I understand that the default filtering logic for multiple filters is AND (and that makes total sense), but there are more complex use cases.
For example: I am interested in data from male individuals that have one of 3 diseases (which are not hierarchically related).
Currently, I'd query multiple times and do the intersection/combination of results on the client side to get to my final answer.
I was wondering if there is a defined/supported way to express a logic that combines AND with OR logic to perform more complex queries on the server side. Perhaps I just overlooked it in the docs or there's a semi-official solution?
My particular view point is with regards to beacon network and UI, with the following two opposing issues:
I have seen implementations using multiple ontology terms (space separated) per filter allowing an OR logic for terms within a filter and AND logic across filters. But I don't know if that is to be supported.
As I am fairly new to the beacon world I was hoping this has come up elsewhere and I could get some guidance/advice from more experienced people. Any and all feedback is much appreciated! Thanks!
FYI: @victorskl @anuradhawick