digirati-co-uk / dlcs-search-service

Search service for IIIF Content Search and annotation indexing.
MIT License
3 stars 0 forks source link

Indexing: IIIF Presentation API Manifests #18

Open mattmcgrattan opened 5 years ago

mattmcgrattan commented 5 years ago

Search use cases beyond IIIF Content Search "search within", for example:

"I want to see all the pre-20th century archival records that contain 'Navajo'"

or

"Which documents from this archival series have been tagged with 'Paris'"?"

or

"Find 'John Smith' in records from 1929"

require indexing of IIIF Presentation API content, not just annotations.

Potentially index IIIF Presentation API content:

stephenwf commented 5 years ago

Could this be paired a new return format (Collections of manifests?)

mattmcgrattan commented 5 years ago

I think potentially yes, the functions of things like the IDA Topic Collection service (and others) could be met by a search service that could return a manifest, or collection of manifests, or potentially a IIIF Change Discovery Activity Stream in response to some query or other.

mattmcgrattan commented 5 years ago

See:

https://github.com/digirati-co-uk/dlcs-search-service/issues/23 https://github.com/digirati-co-uk/dlcs-search-service/issues/24

tomcrane commented 5 years ago

Ideally, the search server only indexes IIIF metadata strings as a last resort. I would prefer never to do this. We should be following seeAlso links, recognising vocabularies and profiles, and indexing semantic data for any discovery purposes.

But... that's pie in the sky for now, until we start seeing the adoption of useful seeAlso links. I would really like to limit it to label and description/summary which at least have enough semantics to be useful. We should lead by example; we don't want to get into maintaining a code base that has code for working which of the metadata fields might represent the author/creator/etc of some work, which fields might represent some temporal information. That will just lead to pain. We should lead by example here and encourage good discovery practices from publishers.

mattmcgrattan commented 5 years ago

I was thinking purely string search across whatever textual fields exist in the manifest -- no semantics at all. With a view to servicing search cases like Ghent's. But that could be dropped, potentially.

tomcrane commented 5 years ago

Yeah, I'm happy with that. I don't want to get into what pontiiiiiiiifffff was trying to do.