CrossRef / cayenne

MOVED to https://gitlab.com/crossref/rest_api
https://gitlab.com/crossref/rest_api
MIT License
17 stars 9 forks source link

funder indexing causes elastic to fail with too many fields #36

Open MikeYalter opened 6 years ago

MikeYalter commented 6 years ago

I know the number of fields can be increased, but I'm concerned that there more than 1000 in funders, as they shouldn't have that many fields. I suspect they're being generated incorrectly. This happens with the live RDF data, which can be had from the default configuration.

java.lang.IllegalArgumentException: Limit of total fields [1000] in index [funder] has been exceeded at org.elasticsearch.index.mapper.MapperService.checkTotalFieldsLimit(MapperService.java:626) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:450) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:353) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:285) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:313) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:230) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:643) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:273) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:198) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:133) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

ppolischuk commented 6 years ago

Triage update: we are holding on this for a week. Mike, Joe, or somebody from Crossref will perform an initial investigation next week, then depending on results we will pass to Juxt.