elastic / elasticsearch-clients-tests

Common tests for Elasticsearch Clients
Apache License 2.0
0 stars 1 forks source link

Inference tests #83

Open picandocodigo opened 5 days ago

picandocodigo commented 5 days ago

In case it's helpful for others using these tests, I solved an error in the Serverless client with our Inference test. When running this part:

  - do:
      inference.inference:
        inference_id: elser_model_test
        body:
          input: 'The sky above the port was the color of television tuned to a dead channel.'

The server needs to download and start ELSER. So I get this error:

{
  "error": {
    "root_cause": [
      {
        "type":"status_exception",
        "reason":"Trained model deployment [elser_model_test] is not allocated to any nodes"
      }
    ],
    "type":"status_exception",
    "reason":"Trained model deployment [elser_model_test] is not allocated to any nodes"
    },
  "status":409
}

My understanding is no matter how much timeout we add to any of the operations on the test, it won't work. Because nothing really times out, the APIs return the response. In the case of inference, timeout is a query parameter: Controls the amount of time to wait for the inference to complete. Defaults to 30 seconds. But it doesn't wait with a bigger timeout because the model is not allocated. We just need to wait...

I think all the clients implement these parameters, but I'm not 100% sure. But what worked for me was instantiating my test client with the following:

{
  retry_on_status: [409],
  retry_on_failure: 10,
  delay_on_retry: 60_000,
  request_timeout: 120
}

The request_timeout is in seconds and probably not even necessary for this particular case. But for the rest, I'm retrying for 409 errors, up to 10 times, with a delay of 60s between retries. The build is passing and looking at it run, it's definitely taking longer but most of the wait is in inference.inference waiting for ELSER to be ready.

ezimuel commented 5 days ago

I don't know if we can have the "loading" of this elser_model_test using a specific API to prevent this timeout. Using a local ES this error seems not be present. I think this is specific for serverless.