KorAP / Kustvakt

:speedboat: User and policy management component for KorAP, capable of rewriting queries for policy based document restrictions.
BSD 2-Clause "Simplified" License
4 stars 3 forks source link

Support a parameter to make access rewrites fatal #51

Open Akron opened 5 years ago

Akron commented 5 years ago

Currently when an access rewrite occurs and a client does not check for rewrites, the changes to the underlying corpus resource of a query are easily missed and not taken into account by the user. To make catching these mistakes easier, a parameter like access-rewrite-fatal may be introduced to all access rewriting endpoints that will return a failure code instead of results whenever an access rewrite occurred.

This feature was suggested by @kupietz

margaretha commented 4 years ago

Could you explain what an access rewrite here means?

Kustvakt maintains access rewrite by setting a set of availability constraints to the corpus query with respect to the granted user access. Typically this is added to any search query, except when it includes a corpus query with a proper availability set. For instance the following URL contains the required availability constraint, thus access rewrite is not performed: https://korap.ids-mannheim.de/api/v1.0/search?q=Monnemer&ql=poliqarp&fields=textSigle,title,availability&cq=availability+%3D+%2FCC-BY.*%2F

kupietz commented 4 years ago

The goal of this parameter would be to optionally make unnoticed access rewrites and corresponding result artifacts in API client script queries 100% impossible. It will typically be used together with access-rewrite-disabled=true, I guess.

Akron commented 4 years ago

But there can't be rewrites, when they are disabled ... :confused:

Akron commented 4 years ago

When access-rewrite-disabled=true is set, there shouldn't be a result for non-public fields, in case they are requested - or better: There shouldn't be a result at all when non-public fields are requested.

margaretha commented 4 years ago

For access-rewrite-disabled=true, there is actually an access rewrite to add availability fields to get all resources.

There shouldn't be a result at all when non-public fields are requested.

I agree. Although this is not implemented because there are no non-public fields at the moment.

Akron commented 4 years ago

For access-rewrite-disabled=true, there is actually an access rewrite to add availability fields to get all resources.

What is the reasoning behind that?

margaretha commented 4 years ago

For a simple query like: https://korap.ids-mannheim.de/api/v1.0/search?q=Monnemer&ql=poliqarp&fields=textSigle,title,availability&access-rewrite-disabled=true

Availability fields have to be set to get all allowed resources. I think there might be resources containing availability fields that may not be allowed.

When the URL contains a corpus query with inadequate availability fields, such an access rewrite is also required. For instance:

https://korap.ids-mannheim.de/api/v1.0/search?q=Monnemer&ql=poliqarp&fields=textSigle,title,availability&cq=availability+%3D+%2FCC-BY.*%2F&access-rewrite-disabled=true

Akron commented 4 years ago

Availability fields have to be set to get all allowed resources. I think there might be resources containing availability fields that may not be allowed.

I would expect that access-rewrite-disabled=true would mean exactly that: Disable all access rewrites. The specificities of availability fields for DeReKo should be ignored. The policy, If i understood @kupietz correctly, is, that all public fields are public. That would, of course, mean that there is no way to hide any corpora in KorAP. But I understood it that way. If this is wrong, we should rename the parameter, I think.

margaretha commented 4 years ago

The parameter name is maybe misleading. The implementation was to enable public metadata request.

Akron commented 4 years ago

But what is the reasoning for the rewrite?

margaretha commented 4 years ago

As I described above. But I think you are right, for the second case, there shouldn't be any rewrite. I am not sure for the first case. Is there no other licenses that are not covered by CC.BY.*, ACA.*, QAO.* ? Otherwise why did we decided to specify exactly only these licenses for logged in users? I suppose that was the reason.

Akron commented 4 years ago

These are DeReKo/i5 specific license fields. I guess we said we wanted them added for logged in users so we can explicitely exclude private corpora.

margaretha commented 4 years ago

You mean private corpora other than DeReKo/i5? And this is allowed for public metadata requests?

Akron commented 4 years ago

I think so. Otherwise the parameter should definitely be renamed. But as this is a license question, we should ask @kupietz .