KorAP / Kalamar

:octopus: Mojolicious-based Frontend for KorAP
BSD 2-Clause "Simplified" License
7 stars 2 forks source link

type:text shouldn't show up in the KoralQuery serializer #196

Open margaretha opened 1 year ago

margaretha commented 1 year ago

While investigating https://github.com/KorAP/Krill/issues/86, I found that corpusTitle eq gingko is serialized as

{
    "@type": "koral:doc",
    "key": "corpusTitle",
    "match": "match:eq",
    "value": "gingko",
    "type": "type:text"
}

whilst type:text is not a type supported according to the KoralQuery doc and it is practically also not supported in Krill.

The type is not added by any query rewrite as it is not added when sending a direct API request:

https://korap.ids-mannheim.de/instance/test/api/v1.0/search?q=ich&cq=availability+%3D+%2FCC-BY.*%2F+%26+docTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe=

Could it be that Kalamar add the type?

Akron commented 1 year ago

It's interesting that this shows up in the KQ-Viewer. The type:text is an index type and is introduced to help the VC Builder to show allowed operators. With this issue: Do you mean this shouldn't show up in the serialization or is there a bigger issue?

margaretha commented 1 year ago

Yes, it shouldn't show up in the serialization and it shouldn't be used in general. There should be no problem with that in the backend since Kalamar only sends the corpus query, not KoralQuery.

margaretha commented 1 year ago

Could you please check what request Kalamar actually sends to Kustvakt? I don't get any results sending the example direct API request using OAuth2 token and VPN, while Kalamar shows some results as reported in https://github.com/KorAP/Krill/issues/86.

Akron commented 1 year ago

Well - it is used by the corpus builder and it is used for indexing - so what do you mean by "it shouldn't be used in general"? Yes it is not helpful in a corpus request, but that is not happening.

Akron commented 1 year ago

I am not sure to which query you are refering to.

margaretha commented 1 year ago

Well - it is used by the corpus builder and it is used for indexing - so what do you mean by "it shouldn't be used in general"? Yes it is not helpful in a corpus request, but that is not happening.

I suppose it shouldn't be used since it is not part of the KoralQuery doc and not supported in backend. Why is it used by corpus builder and indexing?

I am not sure to which query you are refering to.

sorry for not being clear. I mean the query in https://github.com/KorAP/Krill/issues/86 or the one I wrote above: https://korap.ids-mannheim.de/instance/test/api/v1.0/search?q=ich&cq=availability+%3D+%2FCC-BY.*%2F+%26+docTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe= but using Kalamar instead of a direct API request.

Akron commented 1 year ago

The KoralQuery doc currently only covers the request and error reporting stuff - neither the indexing nor the response data format. Krill supports it for indexing (see index/FieldDocument) and for responses (see response/MetaFieldsObj). type:text means, the field is indexed tokenized, so single words can be searched in (like for title) as well as a whole string match works. This obviously means that the operators in the visual corpus builder should differ.

That query doesn't show results to me. The request is: https://korap.ids-mannheim.de/instance/test/api/v1.0/search?context=40-t%2C40-t&count=25&cq=availability+%3D+%2FCC-BY.*%2F+%26+docTitle+%3D+%22gingko%22&cutoff=true&offset=0&q=ich&ql=poliqarp

margaretha commented 1 year ago

Thanks for your explanation.

The query should show results with OAuth2 token and VPN since the Gingko corpus is restricted.

Akron commented 1 year ago

But the VC is limited to CC-BY.*

margaretha commented 1 year ago

Sorry you are right. The request shouldn't be restricted to CC-BY.* Besides I made a mistake due to the URL encoding for diacritics etc

For the following query

https://korap.ids-mannheim.de/instance/test?q=Z%C3%BCndkerze&cq=corpusTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe=

Kalamar would send the query below to Kustvakt, right?

curl -v -H "Authorization: Bearer token" 'https://korap.ids-mannheim.de/instance/test/api/v1.0/search?q=Z%C3%BCndkerze&cq=corpusTitle+%3D+%22gingko%22&ql=poliqarp&cutoff=1&state=&pipe='

This doesn't seem to be a problem from Kalamar and isn't related to type:text so I suppose we should discuss in https://github.com/KorAP/Krill/issues/86 instead

Akron commented 1 year ago

Yes, this is unrelated. Regarding this topic: I think the corpus assistant shouldn't alter the query serialized by the KoralQuery helper - but I think that's the only problem there is and it's a minor one, not affecting any functionality of the platform.