IIIF / api

Source for API and model specifications documents (api and model)
http://iiif.io/api
107 stars 54 forks source link

Limit search of annotation content by language #513

Open azaroth42 opened 9 years ago

azaroth42 commented 9 years ago

Should be able to limit the language of the annotation content in the search. "and" for example is "duck" in Danish. (Donald Duck is Anders And)

Currently can't do this (unless languages had uris that could be filtered on ... which they don't)

azaroth42 commented 9 years ago

Could implement this by server specific extensions to the q param. (e.g. and@en vs and@dk or whatever). Given lack of overlap in tokens between most languages (e.g. English and Welsh, @glenrobson), in practice it's unlikely to cause many difficulties?

On the other hand, if it were a separate optional parameter, one could search for annotations in latin, without searching for a particular word.

Otherwise, eds agree to defer until more experience.

glenrobson commented 9 years ago

We don't generally offer search by language and assume if someone wants Welsh they will search for a Welsh word as you suggest. The only exception to this is for a WW1 project we got some of the Welsh Newspapers OCR machine translated from Welsh to English, we haven't put this live and probably wont be looking at this until next year.

We've struggled a bit on how to make this search functionality intuitive to the user as we don't plan to show the machine generated English as its not awfully accurate so a user would search using an English word like 'war' and be shown Newspaper pages that contain 'rhyfel'. We'd probably want to express the fact the English was machine translated rather than treat it equally to the source English material with a@en tag (not sure how we would do this).

eroux commented 6 years ago

I'm designing our API to search annotations, and this feature would be crucial for us. Our use cases searching terms in Latin alphabet that could be English or romanization of either Pali, Sanskrit or Tibetan. These different possibilities trigger completely different paths in our search engine (different Lucene analyzers).

So for us it's necessary to include a language for the searched term, not only to filter results, but to make sure the search works properly.

mikeapp commented 2 years ago

We have had DH projects that produced manifests with annotations sets in multiple languages, but agree with the discussion above that the ambiguous cases are likely to be infrequent. This would potentially be more useful if content search was implemented at the Collection level, but I'm not aware of real world examples of this.

azaroth42 commented 2 years ago

Should this be included in Search 3.0? (thumbs up/down on the comment please)