KorAP / Krill

:mag: A Corpus Data Retrieval Index using Lucene for Look-Ups
BSD 2-Clause "Simplified" License
16 stars 3 forks source link

Search query returning metadata only #58

Closed margaretha closed 5 years ago

margaretha commented 5 years ago

Krill should support search queries returning only all metadata without match snippets, thus allowing search on all data without license restrictions.

Metadata should be return for every match regardless of redundancy.

Akron commented 5 years ago

I guess this requires a change to Krill as well as a change to Kustvakt, right?

margaretha commented 5 years ago

Yes, I can adapt the code in Kustvakt when the function in Krill is ready. It can be a separate API in addition to the normal search.

Akron commented 5 years ago

In Krawfish a snippet is a separate function that can "enrich" a match in the same sense as, e.g., fields. I think that's preferable. In the long run, we may want to follow that, because it would allow to have enrichment-specific parameters (like context for snippets). I haven't thought of a REST-API to implement this yet, but it may be beneficial. Because the response would be similar, I wouldn't introduce a new API. Maybe for the moment a "no-snippet" parameter would suffice.

margaretha commented 5 years ago

Actually, I meant a new API in Kustvakt because it doesn't involve user authentication, but I think it can be handled in the existing API too. I'll look into it when the function in Krill is ready.

Akron commented 5 years ago

I thought the rewrite detector may just not take effect if no snippet was requested.

Akron commented 5 years ago

After a brief discussion with @kupietz regarding the required user authentication we may need to introduce a query parameter like rewrite=false, that will not rewrite the query (or the vc to be more specific - so we may want to have two parameters: corpus-rewrite and query-rewrite) but fail, in case the user requests information only an authorized user can retrieve. If, for example, a user requests snippets or any other "protected" field, the query will fail. If the user requests open metadata, the query will run and return all metadata ignoring that the user may not have access to the specific corpora.

margaretha commented 5 years ago

We should also handle VC reference. I would suggest that reference to system VC should be allowed but reference to VC owned by users should not be allowed. With corpus-rewrite=false, VC reference rewrite would however be disabled (https://github.com/KorAP/Kustvakt/issues/11).

Akron commented 5 years ago

That's a very good point. Otherwise information about the content of a private VC could be leaked. Maybe the name access-rewrite would be better and would refer to corpus and query? Regarding private vs. public VCs we could handle it as with public vs. private metadata or snippets: When something private is requested, the query fails.

Akron commented 5 years ago

This issue should partially be moved to Kustvakt.