Open karindalziel opened 2 years ago
Note that this would probably come with updating the API to the latest version of ES
A bit more detail:
I want to add data that looks like this to the API:
"person": [
{
"id": "per.001",
"name": "Smith, Emily",
"role": "Sender"
},
{
"id": "per.002",
"name": "Thomas, Frank",
"role": "Recipient"
},
{
"id": "per.003",
"name": "Franklin, Gina",
"role": "Sender"
},
{
"id": "per.004",
"name": "Bell, James",
"role": "Recipient"
}
]
And then on the browse page or search facets, I would like to have the option to browse by "Sender" or "Recipient"
In orchid currently, you can only, for instance, select person.name, which will select all the "name" keys from the person field, but you can't select only the names from the people with Role =X
Will had pointed out that you can choose one facet and that will limit the others, for instance, if you add person.role and then facet by that, the resulting name list is only those with person.role, but that doesn't quite work because ALL the facets would be limited, and I want an initial list with no faceting but Sender and Receiver.
As an example, here is an API facet return for person.role and person.name
{
"req": {
"query_string": "/collection/test/items?num=0&sort[]=title_sort|asc&facet_limit=20&facet_sort=count|desc&browse_sort=term|asc&hl_fl=annotations_text%2C+transcription_t%2C+text&hl_num=5&facet[]=person.name&facet[]=person.role"
},
"res": {
"code": 200,
"count": 4,
"facets": {
"person.role": {
"": 4,
"recipient": 3,
"sender": 3
},
"person.name": {
"Chesnutt, Charles W., (Charles Waddell)": 4,
"Washington, Booker T., 1856-1915": 3,
"Bruce, Blanche Kelso": 1,
"Green, John Patterson": 1,
"Smith, Harry C.": 1
}
},
"items": [
]
}
}
But what I want to show on the search facets is:
I don't think there is currently a way to get that info from the API.
NEXT STEP: Determine if the API can results facets as indicated
I created a test repository to post the kind of data I want to look at: https://github.com/CDRH/data_test
and posted, cdrh dev api call is here: https://cdrhdev1.unl.edu/api/v1/collection/test/items?num=0&sort[]=title_sort|asc&facet_limit=20&facet_sort=count|desc&browse_sort=term|asc&hl_fl=annotations_text%2C+transcription_t%2C+text&hl_num=5&facet[]=person.name&facet[]=person.role
Elasticsearch can handle the necessary queries, but we need to modify Orchid and the API to handle them as facets and as query strings in the API GET request. To find subcategory="manuscripts",
{"aggs": {"marginalia": {"terms": {"field":"subcategory", "include":"manuscripts"}}}}
or {"query": {"term":{"subcategory":"marginalia"}},"aggs": {"subcategory": {"terms": {"field":"subcategory”}}}}
. The former method creates a new aggregation, the latter stores the result in "hits"
For a matching a nested value, i.e. creator.name="Walt Whitman":
{"aggs": {"creator.name":{"nested":{"path":"creator"}, "aggs":{"creator.name":{"terms":{"field":"creator.name", "order":{"_count":"desc"}, "size":"20","include":"Walt Whitman"} }}}}}
Though I realize your query above is more complicated. (But can the above queries be faceted with the current API?)
This may also be useful: https://discuss.elastic.co/t/nested-filter-aggregation/82639
The encode_param method in app/services/api_bridge/query.rb
cannot handle an equal sign in a facet name (i.e. facet[]=person.name[person.role=judge]
), since it splits on '='. I think equal signs are confusing in the request URI anyway, so we should come up with another symbol.
Maybe the pipe symbol, like person.name[person.role|judge]?
Here is a bigger problem: a facet name like that gives the following elasticsearch error:
Invalid aggregation name [person.name[person.role|judge]]. Aggregation names must be alpha-numeric and can only contain '_' and '-'"
One possible solution: create "alternate" keys in the YAML file allowing elasticsearch to store the aggregation under a different name. Then Orchid will have to be changed to use this alternate key to display the facets.
Here is a query that apparently works:
{"aggs":{"people":{"nested":{"path":"person"},"aggs":{"includes_judge":{"filter": {"term": {"person.role": "judge"}}, "aggs": { "judges": {"terms": {"field": "person.name"}}}}}}}}
Do we want to restrict this to nested fields? I think the pattern of filter aggregations is broader.
Elsticsearch 6.8 up (at least) has the ability to create a nested aggregation based on another nested value. details here
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
in orchid, this would be very useful to return a list of facets (aggregations) by a role, for instance
person.role = attorney
this would necessitate a change to both the API to handle the query and orchid to handle the results (if they are changed) and handle the query setup in public.yml
thinking through this a bit, the api query could look something like this
facet[]=person.name[by(person.role=attorney)]
from there, it would probably return a pretty similar list of facets as it currently does, with only a change to the facet return being changing
to something like
And then in Orchid, we'd have to add the same functionality to detect in public.yml, though it may be handled by the search term in the API
becomes