elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.59k stars 24.63k forks source link

Updating index metadata does an expensive validation of the mapping even if unchanged #89309

Open DaveCTurner opened 2 years ago

DaveCTurner commented 2 years ago

Updating index settings on a large number of indices can take many minutes and the resulting cluster state update might fail to cleanly publish without warning. A significant part of this slowness is that the index metadata validation that is run for each index. This validation deserialises + reserializes the mapping for every index that got updated, which for large mappings combined with a large number of updates indices can take many minutes.

  100.0% [cpu=98.6%, other=1.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-5][masterService#updateTask][T#1]'
     10/10 snapshots sharing following 26 elements
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.ObjectMapper$Builder.buildMappers(ObjectMapper.java:150)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.ObjectMapper$Builder.build(ObjectMapper.java:171)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.ObjectMapper$Builder.build(ObjectMapper.java:64)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.ObjectMapper$Builder.buildMappers(ObjectMapper.java:150)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.RootObjectMapper$Builder.build(RootObjectMapper.java:110)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.MappingParser.parse(MappingParser.java:99)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.MappingParser.parse(MappingParser.java:94)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.MapperService.parseMapping(MapperService.java:370)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:347)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:337)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:810)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.metadata.MetadataUpdateSettingsService$1.execute(MetadataUpdateSettingsService.java:247)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.metadata.MetadataUpdateSettingsService.lambda$new$0(MetadataUpdateSettingsService.java:79)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.metadata.MetadataUpdateSettingsService$$Lambda$3405/0x00000008014c4400.execute(Unknown Source)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:908)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:878)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:248)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:156)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:110)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:148)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:709)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260)
       app/org.elasticsearch.server@8.3.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@18.0.1.1/java.lang.Thread.run(Thread.java:833)

We should find a way to skip unnecessary mapping validation when nothing about the mappings has changed, and to make use of mapping deduplication here too.

Relates #77466 Extracted from #87120

elasticsearchmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

romseygeek commented 1 year ago

I think we can do this relatively simply by replacing the call to MapperService.merge() with a call to MapperService.updateMapping(), which already has some sanity checks to ensure that we don't needlessly apply updates when the mapping hasn't actually changed.

javanna commented 1 year ago

Alan tried to fix this but realized that we do need this validation because of analyzers that may have changed and are outside of the mappings but part of index settings.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)