elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.43k stars 24.87k forks source link

[ML] Inference API chunking large documents #106185

Open jonathan-buttner opened 8 months ago

jonathan-buttner commented 8 months ago

Description

Large documents need to be chunked otherwise tokens exceeding the model's limit won't be used.

MVP for default word based chunking strategy:

MVP for configurable chunking settings:

Post MVP features for configurable chunking settings:

Tasks already completed:

elasticsearchmachine commented 8 months ago

Pinging @elastic/ml-core (Team:ML)