The inference API supports text embedding and rerank task types. If a inference endpoint is created for text embedding, and a request is made to perform inference and the request contains a query, the request will hang and an exception will be thrown. Eventually the request times out.
Perform inference including an invalid query field
POST _inference/rerank/cohere
{
"input": ["ice cream", "some really bad food", "fruit"],
"query": "What is the best food?"
}
Exception
```
[2024-07-17T14:33:55,509][WARN ][o.e.x.i.e.h.s.R.RateLimitingEndpointHandler] [runTask-0] Executor service grouping [235880600] failed to execute request java.lang.IllegalArgumentException: Unsupported inference inputs type: [class org.elasticsearch.xpack.inference.external.http.sender.QueryAndDocsInputs]
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.InferenceInputs.createUnsupportedTypeException(InferenceInputs.java:14)
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.DocumentsOnlyInput.of(DocumentsOnlyInput.java:17)
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.CohereEmbeddingsRequestManager.execute(CohereEmbeddingsRequestManager.java:52)
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTaskInternal(RequestExecutorService.java:416)
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTask(RequestExecutorService.java:388)
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.handleTasks(RequestExecutorService.java:237)
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.start(RequestExecutorService.java:192)
at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
[2024-07-17T14:34:25,468][WARN ][r.suppressed ] [runTask-0] path: /_inference/text_embedding/cohere, params: {inference_id=cohere, task_type_or_id=text_embedding}, status: 500 org.elasticsearch.ElasticsearchTimeoutException: Request timed out waiting to be sent after [30s]
at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestTask.lambda$getListener$2(RequestTask.java:67)
at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.support.ListenerTimeouts$TimeoutableListener.run(ListenerTimeouts.java:102)
at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
```
The issue is that this code throws an exception because the query field is not valid for a text embedding request.
One fix would be to wrap this logic in a try/catch.
and then have the implementation call an abstract method that each of the subclassed request manager's implement. The BaseRequestManager can implement the try/catch logic and check for an IllegalArgumentException and wrap it in an ElasticsearchStatusException. It will also need to call listener.onFailure.
Another improvement could be to have the subclassed request managers return a Runnable or even an ActionRunnable that could handle some of the try/catch logic.
Description
The inference API supports text embedding and rerank task types. If a inference endpoint is created for text embedding, and a request is made to perform inference and the request contains a
query
, the request will hang and an exception will be thrown. Eventually the request times out.Steps to reproduce:
query
fieldException
``` [2024-07-17T14:33:55,509][WARN ][o.e.x.i.e.h.s.R.RateLimitingEndpointHandler] [runTask-0] Executor service grouping [235880600] failed to execute request java.lang.IllegalArgumentException: Unsupported inference inputs type: [class org.elasticsearch.xpack.inference.external.http.sender.QueryAndDocsInputs] at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.InferenceInputs.createUnsupportedTypeException(InferenceInputs.java:14) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.DocumentsOnlyInput.of(DocumentsOnlyInput.java:17) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.CohereEmbeddingsRequestManager.execute(CohereEmbeddingsRequestManager.java:52) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTaskInternal(RequestExecutorService.java:416) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService$RateLimitingEndpointHandler.executeEnqueuedTask(RequestExecutorService.java:388) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.handleTasks(RequestExecutorService.java:237) at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestExecutorService.start(RequestExecutorService.java:192) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) [2024-07-17T14:34:25,468][WARN ][r.suppressed ] [runTask-0] path: /_inference/text_embedding/cohere, params: {inference_id=cohere, task_type_or_id=text_embedding}, status: 500 org.elasticsearch.ElasticsearchTimeoutException: Request timed out waiting to be sent after [30s] at org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.RequestTask.lambda$getListener$2(RequestTask.java:67) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.support.ListenerTimeouts$TimeoutableListener.run(ListenerTimeouts.java:102) at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) ```The issue is that this code throws an exception because the
query
field is not valid for a text embedding request.One fix would be to wrap this logic in a try/catch.
A more complete solution might be to have the BaseRequestManager implement this execute method
and then have the implementation call an abstract method that each of the subclassed request manager's implement. The
BaseRequestManager
can implement the try/catch logic and check for anIllegalArgumentException
and wrap it in an ElasticsearchStatusException. It will also need to calllistener.onFailure
.Another improvement could be to have the subclassed request managers return a Runnable or even an
ActionRunnable
that could handle some of the try/catch logic.