elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.38k stars 24.55k forks source link

[ML] Inference API chunking large documents #106185

Open jonathan-buttner opened 5 months ago

jonathan-buttner commented 5 months ago

Description

Large documents need to be chunked otherwise tokens exceeding the model's limit won't be used.

MVP for default word based chunking strategy:

MVP for configurable chunking settings:

Post MVP features for configurable chunking settings:

Tasks already completed:

elasticsearchmachine commented 5 months ago

Pinging @elastic/ml-core (Team:ML)