Open margaretha opened 8 years ago
That's an interesting topic!* We should define behaviour for the special SpanQueryWrapper conditions null
, empty
, optional
, and negative
**.
isNull
: contains(<base/s=s>, [orth=der]{0})
I think, that should translate to <base/s=s>
, because every sentence should contain at least nothing.isEmpty
: contains(<base/s=s>, [])
That should translate as well to <base/s=s>
, under the condition, that the element is not allowed to be empty (in the sense of an empty XML element).isEmpty
with repetition : contains(<base/s=s>, []{2,6})
This is a length query, we don't support at the moment, but we probably should. This would require, that the returned sentences are at least 2 tokens long.isOptional
: contains(<base/s=s>, [orth=der]?)
I think, this also translates to <base/s=s>
, as it allows the same condition as isNull
. isNegative
: contains(<base/s=s>, [orth!=der])
We may want to interpret it the way that we want all sentences that don't contain the second operand. But that's only my interpretation ...Switched operand order:
isNull
: contains([orth=der]{0}, <base/s=s>)
I think, this should be null as well, because nothing can't contain anything (or overlap, or startsWith etc.)isEmpty
: contains([], <base/s=s>)
I think that should work, whenever the second operand has exactly the length of one token.isEmpty
with repetition : contains([]{2,6}, <base/s=s>)
That should work as well, whenever the second operand has a length between two and 6 tokens.isOptional
: contains([orth=der]?, <base/s=s>)
I think, the optionality here can be ignored, as this is either null
without results (see above) or a defined token, that may work with a sentence of length 1.isNegative
: contains([orth!=der], <base/s=s>)
That's a tough one. Theoretically I can see, what the user may expect with negativity of the first operand, but I don't see how we can resolve that. We should through an error.The problem may be more complicated, in case both operands have one of the above mentioned conditions. The list does not include the possibility of classed operands. For example contains(<base/s=s>, {1: [orth=der]?})
can't be reduced to <base/s=s>
...
extended
can have an impact hereThanks for the detailed suggestions. What is the purpose of the condition names? They look like booleans.
I agree with
isNull: contains(<base/s=s>, [orth=der]{0} isEmpty : contains(<base/s=s>, []) isOptional: contains(<base/s=s>, [orth=der]?)
to be translated to <base/s=s> but the naming is kind of misleading. It's not that the query is null or empty but it contains nothing or contains an empty token. An empty token is also ambiguous, not that it must be empty but more like arbitrary.
isEmpty with repetition : contains(<base/s=s>, []{2,6})
why don't we support this? should be plausible in Poliqarp. In FCSQL grammar, []{2,6} within s
is valid.
isNegative : contains(<base/s=s>, [orth!=der])
Hm, I think it's not necessarily all sentences that don't contain the second operand. I would suggest that would be
! contains(<base/s=s>, [orth=der])
but this this is not poliqarp, isn't it? It is also not possible in FCSQL to query
! "der" within s
It can also be interpreted this way: any sentence that not only consists of "der" would be a match.
isNull: contains([orth=der]{0}, <base/s=s>)
Agree.
The switched operand cases are actually pretty weird formulations, unless somebody accidentally mixed it up. Why would a user want an element within a token? or does [] also allow / can be a span/element? So it's not necessarily a koral:token.
isOptional: contains([orth=der]?, <base/s=s>)
This already works currently.
Hi Eliza,
yes, the conditions are boolean and used for query analyzing/optimization in SpanQueryWrapper. isOptional
means, it is either necessary or null - this is important to wrap either an extending SpanQuery or a sequence query - or both in an alternation. But I agree that "empty" is misleading and "arbitrary" would be a better name. Maybe we had the description "empty" from Poliqarp ...
I also agree that we should support length queries. They should also be trivial to implement.
It can also be interpreted this way: any sentence that not only consists of "der" would be a match.
That's how I would prefer it as well.
Why would a user want an element within a token? or does [] also allow / can be a span/element? So it's not necessarily a koral:token.
No - []
is always a token. So the isNull or the isEmpty case is probably wrongly formulated. All other scenarios may make sense though.
P.S. Can you please fix the format of your post?
isOptional: contains([orth=der]?, <base/s=s>) isNegative : contains([orth!=der], <base/s=s>)
Why do you think these are still possible?
P.S. Can you please fix the format of your post?
Fixed!
isOptional may be a sequence as well, like
contains([]*, <base/s=s>)
, and negativity may also be a sequence, like contains([orth!=der][orth!=Mann], <base/s=s>)
. I think even something like that is negative in the wrapping sense: contains([orth!=der][]+[orth!=Mann], <base/s=s>)
.
Thanks for fixing. I thought the elements were eaten by a sanitizer, but now I understand that you omitted them.
I see. That makes sense.
Thanks for fixing. I thought the elements were eaten by a sanitizer, but now I understand that you omitted them.
No, I didn't omit them. I just forgot to escape the angle brackets and check the preview.
*Then isEmpty with repetition : contains([]{2,6}, \<base/s=s\>)
is also possible, isn't?
Ah, okay. Yes - it is reasonable but - as I said in my first reply - would require a SpanQuery that checks for lengths.
I think the condition names are not quite clear and can be misleading.
For instance isEmpty
: What is exactly not empty? Usually it refers to the object itself and I suppose these conditions are set in SpanQueryWrapper. However, isEmpty
does not mean that SpanQueryWrapper is empty. hasEmptyOperand
would probably be more appropriate..
You complained about that in your first reply already. ;) Yes - the names of the conditions refer to the operands, first in the first, then in the second position of the contains(...)
. It does not mean that the contain has the condition.
Krill should handles an empty query as an operand of SpanWithinQuery and throws an appropriate error. Currently both
throws an error : You can't queryize an empty query