Closed margaretha closed 8 years ago
I disagree. The licenses are heterogeneos so this restriction may not apply to the whole VC. Kustvakt may inspect the search results afterwards and alter the response accordingly.
since Kustvakt knows which documents in VC that have the such license, can't it divide the VC into two parts (with restriction and without restriction) and run two Krills in parallel?
There may be multiple restrictions - meaning multiple queries need to be run in parallel. But I don't know how this can be beneficial performancewise ... regarding your edit: Yes, if the length restricting SpanQuery is nested (i.e. not the root query), this may speed up things, but for license restrictions, it would be on the root, right? And worse, a) the statistical data would exclude valid matches and b) the restriction would mean that a SpanQuery needs to be run for all matches, while a filter afterwards only needs to run for the result set (the matches) - and it's technically trivial.
I mean the license restriction about the sentence length, otherwise it would depends on what is restricted i guess. well, the idea is to make it more efficient, otherwise we don't need it. For the statistics, maybe it makes sense to inform the number of all matches although some are not allowed to be shown.
I am also talking about the length restriction - but there may be multiple different length restrictions in a VC.
The query would be less efficient than before, so I am :-1: . As we need all matches for the statistics, limitations don't make much sense.
However, I am :+1: for having a length restricting SpanQuery. But not for this license restricting purpose.
so what would this length restricting spanquery do?
In Poliqarp I would write it like this:
length(2,5: <base/s=s>)
It would search for all sentences and skip all spans that are shorter then 2 and longer than 5 tokens.
okay, so such a query is possible in poliqarp?
No, otherwise we would have implemented it. It may, however, be beneficial. But I guess we shouldn't think about it unless there is need for it, though it's good to have a proposal for such research questions.
Okay. I am going to close this issue since it seems there is no suspense status or something like that.
Due to the licenses of some resources, only a small amount of data/text can be shown as matches/ query results. It might happen that some sentences contain numerous words and thus a restriction on the sentence length is needed. The restriction should be sent by Kustvakt and handled by Krill while doing search.
Edit: the restriction should only be sent for nested spanqueries, since it won't reduce Krill workloads otherwise.