Make total fields limit less of a nuisance to users

javanna commented 1 year ago

Since version 5.0, every index created within Elasticsearch has a maximum total number of fields defined in its mappings, which defaults to 1000. Fields can be manually added to the index mappings through the put mappings API or via dynamic mappings by indexing documents through the index API. A request that causes the total fields count to go above the limit is rejected, whether that be a put mappings, a create index or an index call. The total fields limit can be manually increased using the update index settings API.

The main reason why the total fields limit was introduced (see #11443) is to prevent a mappings explosion caused by ingesting a bogus document with many JSON keys (see #73460 for an example). A mappings explosion impacts the size of the cluster state, the memory footprint of data nodes, and hinders stability.

While the total fields limit is a safety measure against mappings explosion, it is not an effective solution to prevent data nodes from going out of memory due to too many fields being defined: it's an index based limit, meaning that you can have 10k indices with 990 fields each without hitting the limit, yet possibly running into problems depending on the available resources, but a single index with 1000 fields is not allowed. Data nodes load mappings only for the indices that have at least one shard allocated to them, which makes it quite difficult to have a reasonable limit to effectively prevent data nodes from going out of memory.

It is quite common for users to reach the total fields limit, which causes ingestion failures, and consequent need to increase the total fields limit. Our Solutions (e.g. APM) increase the total fields limit too. The fact that many users end up reaching the limit despite they are not ingesting bogus documents sounds like a bug: ideally the limit would be reached only with a very high number of fields that is very likely to be caused by a bogus document, and no user would have to know about or increase the limit otherwise.

The total fields limit has been around for quite some time, so it may very well be that the 1000 default was reasonably high when it was introduced, but it turned out to be too low over time. Possibly all the recent improvements made in the cluster state handling area on dealing with many shards and many indices have also helped supporting more fields in the mappings. An area of improvement is the memory footprint of mappings within data nodes (see #86440), and once we improve that we will be able to support even more fields, yet this is a tangential issue given that the current limit does not prevent data nodes from going out of memory.

I'd propose that we consider making the following changes, with the high-level goal of making the total fields limit less visible to users, yet while being still effective for its original goal:

apply the total fields limit only to dynamic mappings update: if the limit was introduced to protect from dynamic mappings update caused by bogus documents, why do we apply it to every mapping update including the ones triggered by put mappings and create index calls? Shall we stop doing that and apply the limit only to dynamic mappings update?
given that the limit turns out to be too low, and users that rely on dynamic mappings end up having to increase it, would it be reasonable to increase the limit in a way that less users would stumble upon it, yet the limit would still ensure that bogus documents are rejected? Would 10000 fit these requirements? Is it still too low for situations where a single index is created?
should we consider introducing a different mechanism to limit fields creation based on resources, and taking into account also the amount of indices etc. to prevent data nodes from going out of memory due to too many fields in the mappings?

Is there any preparation work needed to feel confident that making the total fields limit more permissive does not cause problems? Could we end up allowing for situations that would have previously legitimately hit the limit?

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

mitar commented 1 year ago

apply the total fields limit only to dynamic mappings update

I think this would be a reasonable approach. Especially if you could configure limit on sub-documents as well (some parts of a document might be coming from users and you want to limit dynamic mappings there).

javanna commented 1 year ago

We discussed with the team, and we said the following:

it would make sense to apply the total fields limit only to dynamic updates but it would also change the semantics of how the limit is applied and we will need to re-discuss at a later stage how this is going to work. We could simply reject documents that introduce e.g. 1000 fields at the same time, but that is not going to be effective for the scenario where documents contain http headers as keys, and each document adds a bunch of those but not a high enough amount to trip the limit. In this case the mappings explosion happens gradually over time. To address this scenario we could instead stop accepting dynamic mapping updates once the limit is reached, but always accept update mappings calls with any number of fields. Would it be weird if for instance an index is created with its own mappings including a high amount of fields, close to the limit, and then the first document that gets indexed introduces a couple of new fields and ends up tripping the limit? We said we'd want to think more about how the new semantics of the limit would be perceived and understood by the users once we have a more concrete proposal.
there were no big concerns around increasing the default limit, though we agreed that it's hard to come up with a reasonable default value. There's a question mark on whether the current rather low limit is helpful in some situations, in forcing users to remodel their data and think of how to avoid having too many fields in their mappings, though we said it should not be so common for users to hit the limit. We will do some more thinking on this and re-discuss at a later stage.

original-brownbear commented 1 year ago

++ to the above, as discussed on another channel.

Applying the limit to dynamic mapping updates only seems to be the way to go to me too. We have a lot of built-in mappings with far more than 1k fields in our own products already.

For dynamic mapping updates I think a higher limit may make sense if this is something that's actually causing trouble for users. The only reservation I had in this regard was that unlike a static mapping of thousands of fields, a dynamic mapping of thousands of fields will have data in every field. This causes higher memory use from Lucene data structures than just having unused fields in a static mapping. As we discussed on another channel, that kind of issue shouldn't be addressed by the default value of this setting though but rather by other means of reducing per-field overhead.

felixbarny commented 1 year ago

Huge +1 on everything that was said here.

Just one addition that wasn't discussed yet in this thread. I think that the biggest pain with the field limit is that it causes data loss. Therefore, there should be a mode where hitting the field limit doesn't lead to rejecting the document but to not adding additional dynamic fields once the limit has reached. In other words it should be possible to index the first 1000 dynamic fields and after that, the additional fields would just be stored but not added to the mapping (similar to dynamic: false).

This would resolve a huge pain point that we have for logging and tracing data where bogus documents or misuse of an API can lead to data loss. As in APM, multiple services share the same data stream, a single misbehaving service can cause data loss for all services.

felixbarny commented 5 months ago

Update: We've merged

https://github.com/elastic/elasticsearch/pull/96235

This addresses the document loss when adding dynamic fields beyond the limit. It doesn't cover the part to apply the field limit only to dynamic mappings updates.

jackgray commented 2 months ago

Is there a way to change this setting on all future indices? I don't understand how to prevent document loss if there is no IaC way of setting this rule before the index is created and starts skipping fields. How would I apply this rule to sharded indices, where streams are piped to an index name with a timestamp?

It seems that this setting cannot be defined in any configuration file like most others, or in the stack management advanced settings.... This has been a massive barrier in Elasticsearch usability for our use with very little documentation. Our ingest data haven't even exceeded 4GB at this stage.

mitar commented 2 months ago

It is easy to specify this in your index configuration:

{
  "settings": {
    "index.mapping.total_fields.limit": 20000
  },
  "mappings": {

  }
}

felixbarny commented 2 months ago

Hey @jackgray, you can use index templates to define the mappings for an index pattern before these indices exist.

Elasticsearch also ships with a default index template for logs-*-*, which has the new index.mapping.total_fields.ignore_dynamic_beyond_limit setting preconfigured. So if you send logs to logs-generic-default, or logs-myapp-default, you get the recommended default settings, including ignore_dynamic_beyond_limit automatically.

Besides that, the default index template for logs-*-* also creates a data stream, which is recommended over index names with a timestamp.

elastic / elasticsearch

Make total fields limit less of a nuisance to users #89911