esmero / archipelago-deployment

Archipelago Commons Docker Deployment Repository
33 stars 15 forks source link

Research vespa.io as an alternative to Solr #122

Open DiegoPino opened 3 years ago

DiegoPino commented 3 years ago

What?

Vespa is a different Fast Search Index with AI integration. Its OSS, it scales so well and it has a very formal and stable API. I would like to explore as project this type of new explorations. We are very comfortable with Solr but its good to not depend only on a single piece of stack and straight into Future's eyes .

What is needed.

If all works fine this far, and we as a group feel there is a benefit for the community,

Marking this as Future Tasks but I will keep an eye for this after the next release. Thanks!

@giancarlobi @mbennett-uoe @alliomeria @dmer

jbaiter commented 3 years ago

Collaborate/help/code-ask-learn with/from the @dbmdz team. We might as well (respectfully) have a chat with @jbaiter about his impressions) so we can slowly do some porting/parallel of their amazing and core to us Solr Highlight but also his general impressions of vespa.io.

We've looked into using Vespa for doing a image similarity search, I'll refer you to @stefan-it, who did the research. I've not looked into Vespa's highlighting implementation and the general implementation yet, so I can't give you an answer if a port of the OCR highlighting stuff would be feasible.

DiegoPino commented 3 years ago

@jbaiter thanks. We will have to do some internal testing, integration In the ecosystem probably will start at the IIIF Search API level (wrapper) before going deeper in our code. The Developer documentation looks promising (shame on me only spend 10 minutes reading it but looked clear) and the processors and query plugins are well document. I appreciate your comments on this.

DiegoPino commented 3 years ago

Interestingly, Vespa has a special type of type (field) that has an even more interesting API. Annotations. Vespa has also structured data and maps as types (not indexable but still usable as return.. but I can see how a JSON snippet may fit there)

In specific annotations allow a "label" (e.g for a given structured piece of content e.g XML (example is HTML so. pretty close) you can "tag" parts of it and its content. Now the fun part goes, each Annotation can have also variable values (who said x,y in an OCR? Or IIIF Annotations?)

https://docs.vespa.ai/en/annotations.html