elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.39k stars 24.56k forks source link

[ML] Inference API request hangs when passing an invalid field #110992

Open jonathan-buttner opened 1 month ago

jonathan-buttner commented 1 month ago

Description

The inference API supports text embedding and rerank task types. If a inference endpoint is created for text embedding, and a request is made to perform inference and the request contains a query, the request will hang and an exception will be thrown. Eventually the request times out.

Steps to reproduce:

  1. Create an inference endpoint
PUT _inference/text_embedding/cohere
{
    "service": "cohere",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "embed-english-v3.0"
    }
}
  1. Perform inference including an invalid query field
POST _inference/rerank/cohere
{
    "input": ["ice cream", "some really bad food", "fruit"],
    "query": "What is the best food?"

}
Exception ``` [2024-07-17T14:33:55,509][WARN ][o.e.x.i.e.h.s.R.RateLimitingEndpointHandler] [runTask-0] Executor service grouping [235880600] failed to execute request java.lang.IllegalArgumentException: Unsupported inference inputs type: [class org.elasticsearch.xpack.inference.external.http.sender.QueryAndDocsInputs] at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.InferenceInputs.createUnsupportedTypeException(InferenceInputs.java:14) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.DocumentsOnlyInput.of(DocumentsOnlyInput.java:17) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.CohereEmbeddingsRequestManager.execute(CohereEmbeddingsRequestManager.java:52) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTaskInternal(RequestExecutorService.java:416) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTask(RequestExecutorService.java:388) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.handleTasks(RequestExecutorService.java:237) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.start(RequestExecutorService.java:192) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) [2024-07-17T14:34:25,468][WARN ][r.suppressed ] [runTask-0] path: /_inference/text_embedding/cohere, params: {inference_id=cohere, task_type_or_id=text_embedding}, status: 500 org.elasticsearch.ElasticsearchTimeoutException: Request timed out waiting to be sent after [30s] at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestTask.lambda$getListener$2(RequestTask.java:67) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.support.ListenerTimeouts$TimeoutableListener.run(ListenerTimeouts.java:102) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) ```

The issue is that this code throws an exception because the query field is not valid for a text embedding request.

One fix would be to wrap this logic in a try/catch.

A more complete solution might be to have the BaseRequestManager implement this execute method

and then have the implementation call an abstract method that each of the subclassed request manager's implement. The BaseRequestManager can implement the try/catch logic and check for an IllegalArgumentException and wrap it in an ElasticsearchStatusException. It will also need to call listener.onFailure.

Another improvement could be to have the subclassed request managers return a Runnable or even an ActionRunnable that could handle some of the try/catch logic.

elasticsearchmachine commented 1 month ago

Pinging @elastic/ml-core (Team:ML)

dimkots commented 1 week ago

@davidkyle @maxhniebergall adding this to up next since we are working towards GA, so this might be a bug we want to solve