KorAP / Krill

:mag: A Corpus Data Retrieval Index using Lucene for Look-Ups
BSD 2-Clause "Simplified" License
16 stars 3 forks source link

Empty token in SpanWithinQuery #20

Open margaretha opened 8 years ago

margaretha commented 8 years ago

Krill should handles an empty query as an operand of SpanWithinQuery and throws an appropriate error. Currently both

throws an error : You can't queryize an empty query

Akron commented 8 years ago

That's an interesting topic!* We should define behaviour for the special SpanQueryWrapper conditions null, empty, optional, and negative**.

Switched operand order:

The problem may be more complicated, in case both operands have one of the above mentioned conditions. The list does not include the possibility of classed operands. For example contains(<base/s=s>, {1: [orth=der]?}) can't be reduced to <base/s=s> ...

margaretha commented 8 years ago

Thanks for the detailed suggestions. What is the purpose of the condition names? They look like booleans.

I agree with

isNull: contains(<base/s=s>, [orth=der]{0} isEmpty : contains(<base/s=s>, []) isOptional: contains(<base/s=s>, [orth=der]?)

to be translated to <base/s=s> but the naming is kind of misleading. It's not that the query is null or empty but it contains nothing or contains an empty token. An empty token is also ambiguous, not that it must be empty but more like arbitrary.

isEmpty with repetition : contains(<base/s=s>, []{2,6})

why don't we support this? should be plausible in Poliqarp. In FCSQL grammar, []{2,6} within s is valid.

isNegative : contains(<base/s=s>, [orth!=der])

Hm, I think it's not necessarily all sentences that don't contain the second operand. I would suggest that would be

! contains(<base/s=s>, [orth=der])

but this this is not poliqarp, isn't it? It is also not possible in FCSQL to query

! "der" within s

It can also be interpreted this way: any sentence that not only consists of "der" would be a match.

isNull: contains([orth=der]{0}, <base/s=s>)

Agree.

The switched operand cases are actually pretty weird formulations, unless somebody accidentally mixed it up. Why would a user want an element within a token? or does [] also allow / can be a span/element? So it's not necessarily a koral:token.

isOptional: contains([orth=der]?, <base/s=s>)

This already works currently.

Akron commented 8 years ago

Hi Eliza, yes, the conditions are boolean and used for query analyzing/optimization in SpanQueryWrapper. isOptional means, it is either necessary or null - this is important to wrap either an extending SpanQuery or a sequence query - or both in an alternation. But I agree that "empty" is misleading and "arbitrary" would be a better name. Maybe we had the description "empty" from Poliqarp ...

I also agree that we should support length queries. They should also be trivial to implement.

It can also be interpreted this way: any sentence that not only consists of "der" would be a match.

That's how I would prefer it as well.

Why would a user want an element within a token? or does [] also allow / can be a span/element? So it's not necessarily a koral:token.

No - [] is always a token. So the isNull or the isEmpty case is probably wrongly formulated. All other scenarios may make sense though.

P.S. Can you please fix the format of your post?

margaretha commented 8 years ago

isOptional: contains([orth=der]?, <base/s=s>) isNegative : contains([orth!=der], <base/s=s>)

Why do you think these are still possible?

P.S. Can you please fix the format of your post?

Fixed!

Akron commented 8 years ago

isOptional may be a sequence as well, like contains([]*, <base/s=s>), and negativity may also be a sequence, like contains([orth!=der][orth!=Mann], <base/s=s>). I think even something like that is negative in the wrapping sense: contains([orth!=der][]+[orth!=Mann], <base/s=s>).

Thanks for fixing. I thought the elements were eaten by a sanitizer, but now I understand that you omitted them.

margaretha commented 8 years ago

I see. That makes sense.

Thanks for fixing. I thought the elements were eaten by a sanitizer, but now I understand that you omitted them.

No, I didn't omit them. I just forgot to escape the angle brackets and check the preview.

*Then isEmpty with repetition : contains([]{2,6}, \<base/s=s\>) is also possible, isn't?

Akron commented 8 years ago

Ah, okay. Yes - it is reasonable but - as I said in my first reply - would require a SpanQuery that checks for lengths.

margaretha commented 5 years ago

I think the condition names are not quite clear and can be misleading.

For instance isEmpty : What is exactly not empty? Usually it refers to the object itself and I suppose these conditions are set in SpanQueryWrapper. However, isEmpty does not mean that SpanQueryWrapper is empty. hasEmptyOperand would probably be more appropriate..

Akron commented 5 years ago

You complained about that in your first reply already. ;) Yes - the names of the conditions refer to the operands, first in the first, then in the second position of the contains(...). It does not mean that the contain has the condition.