aryn-ai / remote-processor-service

Service for hosting remote processors
Apache License 2.0
0 stars 0 forks source link

Remote Processor Service

This project aims to enable "Remote Search Processors". OpenSearch allows post-processing of search responses and requests via an object called a "Search Processor". We've found that implementing these in java and building and installing a plugin every time we want to change the behavior of such a search processor produces a very long iteration cycle. Our solution is to simply pull the search processors out of OpenSearch entirely; instead we install a single plugin that makes RPCs to an external service which contains the processors. This is the external service; the plugin is over here.

In order to keep as much parity with the OpenSearch Search Processor interface, the plugin and external service communicate through protocol-buffered forms of the OpenSearch-internal SearchResponse and SearchRequest objects. The protobuf definitions can be found over here.

We have a little bit of a nomenclature clash. There are two levels of processors in the service: processors and pipelines. A processor is a single unit of processing; whereas a pipeline is made up of a string of processors. You can build a one-processor pipeline. The trick is that the service exposes each pipeline as an RPC to OpenSearch, and in OpenSearch the Remote Search Processor calls out to one of these pipelines. So you will have an OpenSearch search pipeline with a remote search processor that calls out to a remote search pipeline endpoint made up of search processors. Try not to think about it too much. Instead, look at the picture:

untitled

Instructions

Setting this up takes a couple steps. First, get the protocols submodule with

git submodule update --init --recursive

Also install the poetry packages and stuff

poetry install --no-root

Next, generate the grpc/protobuf code. This will create a subdirectory called proto_remote_processor that contains the generated code. Once the grpc code is generated you can install the package itself.

make build_proto
poetry install

Now, build an opensearch image with the remote-processor-plugin installed.

docker build -t rps-os -f docker/Dockerfile .

Also build a docker image for the remote processor service itself

docker build -t rps .

Finally, docker compose up

docker compose -f docker/compose.yml up

Alternately, one can start the OpenSearch container and run RPS locally. Be sure to change rps to localhost in the endpoint below.

docker run -d --rm --network=host -e discovery.type=single-node rps-os
poetry run server config/pipelines.yml

And you should have an opensearch with the remote processor plugin installed and a remote processor service (running the config at configs/cfg1.yml - just the debug processor atm)

Now, to create a remote processor

curl -X PUT http://localhost:9200/_search/pipeline/remote_pipeline --json '
{
    "response_processors": [
        {
            "remote_processor": {
                "endpoint": "rps:2796/RemoteProcessorService/ProcessResponse",
                "processor_name": "debug"
            }
        }
    ]
}'

Test the processor server in pure python with:

from gen.response_processor_service_pb2_grpc import RemoteProcessorServiceStub
import grpc
from gen.response_processor_service_pb2 import ProcessResponseRequest

chan = grpc.insecure_channel('localhost:2796')
stub = RemoteProcessorServiceStub(chan)
req = ProcessResponseRequest()
req.processor_name = "debug"
res = stub.ProcessResponse(req)
print(res)
search_response {
}

Or, test the processor via OpenSearch:

curl 'http://localhost:9200/demoindex0/_search?search_pipeline=remote_pipeline&pretty' --json '
{
  "query": {
    "match": {
      "text_representation": "armadillo"
    }
  }
}'