art-institute-of-chicago / data-aggregator

An API of public data collected from several different systems at the Art Institute of Chicago
GNU Affero General Public License v3.0
65 stars 6 forks source link

[Question] How to set FieldData=true when I make a sort query? #47

Closed ClickHere0521 closed 1 year ago

ClickHere0521 commented 2 years ago

Hi. How are you?

So, I want to sort the data with query parameters.

This GET api is all good. https://api.artic.edu/api/v1/exhibitions/search?sort[id][order]=desc image

But when I sort by title, it went failed https://api.artic.edu/api/v1/exhibitions/search?sort[title][order]=desc image

What I have to do now?

ClickHere0521 commented 2 years ago

@fkleon @nikhiltri @IllyaMoskvin Could you help me with this issue?

IllyaMoskvin commented 2 years ago

@ClickHere0521 Thanks for the ping. You'll want to target the title.keyword subfield:

https://api.artic.edu/api/v1/exhibitions/search?sort[title.keyword][order]=desc

Generally, all fields that are called title or end with *_title or *_titles will have a subfield called keyword. This subfield is optimized for sorting, filtering, and aggregation, while the parent field is optimized for full-text search.

ClickHere0521 commented 2 years ago

@IllyaMoskvin Thanks. That works!!!

But I tried with the other fieds. like short_description and gallery_title

https://api.artic.edu/api/v1/exhibitions/search?sort[short_description.keyword][order]=desc https://api.artic.edu/api/v1/exhibitions/search?sort[short_description][order]=desc

These are all failing.

Can I see any link to subfields?

PS: Is it not supported for the time being?

IllyaMoskvin commented 2 years ago

@ClickHere0521 Good questions. So... regarding short_description, for long-form string fields like that, it usually doesn't make sense to add keyword subfields. The keyword type is meant for situations where we need "exact" matches, and it usually doesn't make sense to want exact matches on long-form fields. I'll show a few examples.

If you want to filter by an exact match you can target the artist_title.keyword subfield to do this:

https://api.artic.edu/api/v1/artworks/search?query[term][artist_title.keyword]=Vincent%20van%20Gogh

However, you can't filter on artist_title directly because it was indexed into Elasticsearch as text type:

https://api.artic.edu/api/v1/artworks/search?query[term][artist_title]=Vincent%20van%20Gogh

Indexing as a text allows us to do full-text search, where exact matching might actually be detrimental.

For example, if you wanted to do full-text search on just the artist_title field specifically, you can do this:

https://api.artic.edu/api/v1/artworks/search?query[match][artist_title]=Vincent%20van%20Gogh

...but unlike the previous term query on artist_title.keyword, if you misspell "Vincent" as "Vinvent", it'll still match:

https://api.artic.edu/api/v1/artworks/search?query[match][artist_title]=Vinvent%20van%20Gogh

You can read up on the term and match queries here:

As an aside, if you want to do full-text search on our collection, we usually recommend just using the q parameter:

https://api.artic.edu/api/v1/artworks/search?q=Vinvent%20van%20Gogh

Behind the scenes, we optimize that q query to target specific fields with specific weights that we've found to return good results. We also optimize it to do phrase matching and a few similar tricks.

On exhibitions, short_description is a text type. Adding a keyword subfield to short_description would be useful only if we want to do some sort of exact matching on the full contents of that field, which we don't see a need for. That's why you can't sort by short_description and why we aren't planning to add short_description.keyword.

However! The fact that you can't sort by gallery_title.keyword is a bug. For our reference:

https://api.artic.edu/api/v1/exhibitions/search?sort[gallery_title.keyword][order]=desc

Per our API conventions, gallery_title should have a keyword subfield. This issue should be fixed the next time we update our search index. We don't currently have a set date for the next update, so unfortunately, I can't give a time estimate on that fix at this time.

Can I see any link to subfields?

Not at the moment, but I've started work on an endpoint that will allow you to do so. Stay tuned.

ClickHere0521 commented 2 years ago

Thank you for that information, I appreciate that. 👍