elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.91k stars 24.73k forks source link

Ease synthetic source enablement #95799

Open jimczi opened 1 year ago

jimczi commented 1 year ago

Description

Today there are multiple ways to handle the storage/retrieval of the _source field, it can be:

Considering the amount of storage that is required to store the plain source field I'd like to discuss how we could ease the usage of the synthetic source to make it work in all cases.

Currently synthetic source is available only under certain conditions. For instance a text field needs to have a sub-keyword field to be eligible. Some geo fields are also not covered. That's a barrier for entry that most users won't break since it's also not well known that storing the plain source is very costly. We see this in most of the ingestion benchmarks that mixes vectors and text. Since text is not eligible as is in the synthetic source (and synthetic is not the default), vectors of large dimensions are left untouched in the source leading to a large disk usage.

I propose that we handle the limitations of the synthetic source automatically. So for instance instead of requiring a sub-keyword field for every text field, we automatically set stored to true for text field in a mapping that enables synthetic source. We could also allow to keep the source for fields that are not automatically handled by synthetic source and merge the results with the extracted doc value fields. Lots of options here but my point is that it should be automatic rather than requiring a manual mapping change.

In the long run I wonder if synthetic source could be the default for all cases (not only TSDS).

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)

jsmith3763 commented 3 months ago

Are there any updates on this idea? We are running into issues with enabling synthetic _source on just ecs fields.

tylerperk commented 3 months ago

Are there any updates on this idea? We are running into issues with enabling synthetic _source on just ecs fields.

Hi @jsmith3763 can you please share more details about the issues you are encountering and how this issue could help with that? Thanks