Open ChrisHegarty opened 2 months ago
Pinging @elastic/es-search-foundations (Team:Search Foundations)
Pinging @elastic/es-search-relevance (Team:Search Relevance)
I marked this Search relevance to check the question about the timeout and also because the exception is thrown in the inference related code. Given a cause is not provided, the status code will be 503, which is trappy. We have recently made a similar change in some of the search code and moved to 504 instead, see SearchTimeoutException. We had a bit of back and forth and decided not to use 4xx for search timeout errors, not sure what this specific timeout exception means though in this context.
IIUC one place where we set the timeout to 10s is the semantic_text
rewriting (SemanticQueryBuilder#doRewriteGetInferenceResults
InferenceAction.Request inferenceRequest = new InferenceAction.Request(
TaskType.ANY,
inferenceId,
null,
List.of(query),
Map.of(),
InputType.SEARCH,
InferModelAction.Request.DEFAULT_TIMEOUT_FOR_API,
false
);
but I can't trace the call from just the logged message. Maybe the ML/Inference team might be in a better position to take a look at this (and change the exception if needed)
Looking into this a bit, we don't have a cause for this exception because it's directly at the source of the timeout, there isn't any other underlying exception that we've caught at that point. I'm happy to change the exception, but I am not sure what would be better than this. Seems like we should consider just increasing the timeout above 10s?
Pinging @elastic/ml-core (Team:ML)
Timeouts on the inference service should not result in a 503. The rest suppressed logger reported this.
Interestingly the timeout is 10s, while the default is not 30s ?