[ML] Inference API request hangs when passing an invalid field

jonathan-buttner commented 3 months ago

Description

The inference API supports text embedding and rerank task types. If a inference endpoint is created for text embedding, and a request is made to perform inference and the request contains a query, the request will hang and an exception will be thrown. Eventually the request times out.

Steps to reproduce:

Create an inference endpoint

PUT _inference/text_embedding/cohere
{
    "service": "cohere",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "embed-english-v3.0"
    }
}

Perform inference including an invalid query field

POST _inference/rerank/cohere
{
    "input": ["ice cream", "some really bad food", "fruit"],
    "query": "What is the best food?"

}

Exception

``` [2024-07-17T14:33:55,509][WARN ][o.e.x.i.e.h.s.R.RateLimitingEndpointHandler] [runTask-0] Executor service grouping [235880600] failed to execute request java.lang.IllegalArgumentException: Unsupported inference inputs type: [class org.elasticsearch.xpack.inference.external.http.sender.QueryAndDocsInputs] at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.InferenceInputs.createUnsupportedTypeException(InferenceInputs.java:14) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.DocumentsOnlyInput.of(DocumentsOnlyInput.java:17) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.CohereEmbeddingsRequestManager.execute(CohereEmbeddingsRequestManager.java:52) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTaskInternal(RequestExecutorService.java:416) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTask(RequestExecutorService.java:388) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.handleTasks(RequestExecutorService.java:237) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.start(RequestExecutorService.java:192) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) [2024-07-17T14:34:25,468][WARN ][r.suppressed ] [runTask-0] path: /_inference/text_embedding/cohere, params: {inference_id=cohere, task_type_or_id=text_embedding}, status: 500 org.elasticsearch.ElasticsearchTimeoutException: Request timed out waiting to be sent after [30s] at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestTask.lambda$getListener$2(RequestTask.java:67) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.support.ListenerTimeouts$TimeoutableListener.run(ListenerTimeouts.java:102) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) ```

The issue is that this code throws an exception because the query field is not valid for a text embedding request.

One fix would be to wrap this logic in a try/catch.

A more complete solution might be to have the BaseRequestManager implement this execute method

and then have the implementation call an abstract method that each of the subclassed request manager's implement. The BaseRequestManager can implement the try/catch logic and check for an IllegalArgumentException and wrap it in an ElasticsearchStatusException. It will also need to call listener.onFailure.

Another improvement could be to have the subclassed request managers return a Runnable or even an ActionRunnable that could handle some of the try/catch logic.

elasticsearchmachine commented 3 months ago

Pinging @elastic/ml-core (Team:ML)

dimkots commented 2 months ago

@davidkyle @maxhniebergall adding this to up next since we are working towards GA, so this might be a bug we want to solve

elastic / elasticsearch

[ML] Inference API request hangs when passing an invalid field #110992

Description