elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.33k stars 24.87k forks source link

[ML] Overview of reindex issues with NLP #113948

Open maxhniebergall opened 1 month ago

maxhniebergall commented 1 month ago

Background

Reindex allows users to create new indexes with data that is already in elasticsearch. This is especially useful for moving to semantic search because users often have already implemented text search and want to embed their existing data in a new index. Unfortunately, reindex has some flaws that make it difficult or impossible to use for larger datasets and when using machine learning models to produce embeddings.

Problems

Resiliency - Issues with failures and errors

Issues with size

Issues with performance

Issues with scroll

Possible solutions in the works?

https://github.com/elastic/elasticsearch/issues/27724#issuecomment-2101539332

elasticsearchmachine commented 1 month ago

Pinging @elastic/ml-core (Team:ML)