Closed DaveCTurner closed 5 years ago
Pinging @elastic/es-core-infra
@andyb-elastic saw a similar failure some time back and this failure included the Elasticsearch logs which said this:
# [2018-02-27T10:45:38,258][INFO ][o.e.n.Node ] initialized
# [2018-02-27T10:45:38,258][INFO ][o.e.n.Node ] [OpH_qGq] starting ...
# [2018-02-27T10:45:38,409][INFO ][o.e.t.TransportService ] [OpH_qGq] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
# [2018-02-27T10:45:38,422][WARN ][o.e.b.BootstrapChecks ] [OpH_qGq] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
# [2018-02-27T10:45:41,463][INFO ][o.e.c.s.MasterService ] [OpH_qGq] zen-disco-elected-as-master ([0] nodes joined), reason: master node changed {previous [], current [{OpH_qGq}{OpH_qGqbSjOWzlsc2tPxfg}{3UZwOqnAT5KXgUcXlqfCxA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8# [2018-02-27T10:45:41,463][INFO ][o.e.c.s.MasterService ] [OpH_qGq] zen-disco-elected-as-master ([0] nodes joined), reason: master node changed {previous [], current [{OpH_qGq}{OpH_qGqbSjOWzlsc2tPxfg}{3UZwOqnAT5KXgUcXlqfCxA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8
370659328, ml.max_open_jobs=20, ml.enabled=true}]}
# [2018-02-27T10:45:41,467][INFO ][o.e.c.s.ClusterApplierService] [OpH_qGq] master node changed {previous [], current [{OpH_qGq}{OpH_qGqbSjOWzlsc2tPxfg}{3UZwOqnAT5KXgUcXlqfCxA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, ml.max_open_jobs=20, ml.enabled=true}]}, reason: apply cluster state (from master [master {OpH_qGq}{OpH_qGqbSjOWzlsc2tPxfg}{3UZwOqnAT5KXgUcXlqfCxA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
# [2018-02-27T10:45:41,477][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [OpH_qGq] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
# [2018-02-27T10:45:41,478][INFO ][o.e.n.Node ] [OpH_qGq] started
# [2018-02-27T10:45:41,652][INFO ][o.e.g.GatewayService ] [OpH_qGq] recovered [0] indices into cluster_state
# [2018-02-27T10:46:11,661][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [OpH_qGq] failed to put template [.monitoring-beats]
# org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (create-index-template [.monitoring-beats], cause [api]) within 30s
# at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:125) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
# at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_144]
# at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:124) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
# at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
# at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
370659328, ml.max_open_jobs=20, ml.enabled=true}]}
# [2018-02-27T10:45:41,467][INFO ][o.e.c.s.ClusterApplierService] [OpH_qGq] master node changed {previous [], current [{OpH_qGq}{OpH_qGqbSjOWzlsc2tPxfg}{3UZwOqnAT5KXgUcXlqfCxA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, ml.max_open_jobs=20, ml.enabled=true}]}, reason: apply cluster state (from master [master {OpH_qGq}{OpH_qGqbSjOWzlsc2tPxfg}{3UZwOqnAT5KXgUcXlqfCxA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
# [2018-02-27T10:45:41,477][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [OpH_qGq] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
# [2018-02-27T10:45:41,478][INFO ][o.e.n.Node ] [OpH_qGq] started
# [2018-02-27T10:45:41,652][INFO ][o.e.g.GatewayService ] [OpH_qGq] recovered [0] indices into cluster_state
# [2018-02-27T10:46:11,661][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [OpH_qGq] failed to put template [.monitoring-beats]
# org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (create-index-template [.monitoring-beats], cause [api]) within 30s
# at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:125) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
# at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_144]
# at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:124) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
# at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
# at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
# at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1# at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1
.8.0_144]
# at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
...
Not sure why the Elasticsearch log wasn't included in the build failure above but it'd be mighty useful if it could appear in future.
If it is the same situation then this is failing to put a bunch of index templates within the 30 seconds timeout. It's not a network issue, since this is a one-node cluster. It'd be useful to see DEBUG
logging from:
org.elasticsearch.action.support.master.TransportMasterNodeAction
org.elasticsearch.action.admin.indices.template.put.TransportPutIndexTemplateAction
(edited - I originally mentioned mappings, but this is not that)
And another
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/2660/console build-2660-opensuse.txt
@DaveCTurner do you have a recommendation for any logging I can add to get more information about why the node is failing to process these cluster state events?
The logs seem somewhat garbled - messages seem to be duplicated in a funny order. Nonetheless, this bit seems unusually slow:
# [2018-05-06T06:46:39,483][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
# [2018-05-06T06:46:43,843][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
# [2018-05-06T06:46:48,220][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
# [2018-05-06T06:46:52,596][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
open_jobs=20, ml.enabled=true}]}, reason: apply cluster state (from master [master {I3r5omj}{I3r5omjKR5evfgfM8ftPbA}{v_6FY1TySDyGEKnupaxjCg}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)[, ]]])
# [2018-05-06T06:46:18,995][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [I3r5omj] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
# [2018-05-06T06:46:18,995][INFO ][o.e.n.Node ] [I3r5omj] started
# [2018-05-06T06:46:23,813][INFO ][o.e.g.GatewayService ] [I3r5omj] recovered [0] indices into cluster_state
# [2018-05-06T06:46:23,931][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.watch-history-7] for index patterns [.watcher-history-7*]
# [2018-05-06T06:46:28,449][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.watches] for index patterns [.watches*]
# [2018-05-06T06:46:32,919][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.triggered_watches] for index patterns [.triggered_watches*]
# [2018-05-06T06:46:35,044][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
# [2018-05-06T06:46:39,483][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
# [2018-05-06T06:46:43,843][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
# [2018-05-06T06:46:48,220][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
# [2018-05-06T06:46:52,596][INFO ][o.e.c.m.MetaDataIndexTemplateService] [I3r5omj] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
# [2018-05-06T06:46:53,834][ERROR][o.e.x.m.e.l.LocalExporter] failed to set monitoring pipeline [xpack_# [2018-05-06T06:46:53,834][ERROR][o.e.x.m.e.l.LocalExporter] failed to set monitoring pipeline [xpack_
monitoring_6]
# org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-pipeline-xpack_monitoring_6) within 30s
On my laptop it takes <1s to make all the templates:
[2018-05-07T18:25:36,684][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [0MEKNSQ] Failed to clear cache for realms [[]]
[2018-05-07T18:25:36,726][INFO ][o.e.g.GatewayService ] [0MEKNSQ] recovered [0] indices into cluster_state
[2018-05-07T18:25:36,884][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.watch-history-7] for index patterns [.watcher-history-7*]
[2018-05-07T18:25:36,924][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.triggered_watches] for index patterns [.triggered_watches*]
[2018-05-07T18:25:36,955][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.watches] for index patterns [.watches*]
[2018-05-07T18:25:37,020][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
[2018-05-07T18:25:37,070][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
[2018-05-07T18:25:37,107][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
[2018-05-07T18:25:37,148][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
[2018-05-07T18:25:37,180][INFO ][o.e.c.m.MetaDataIndexTemplateService] [0MEKNSQ] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
[2018-05-07T18:25:37,319][INFO ][o.e.l.LicenseService ] [0MEKNSQ] license [6aae53f8-5ecf-4fc8-b66a-50f5e31c3ee7] mode [basic] - valid
Perhaps this will shed a little more light into where the time is going:
logger.org.elasticsearch.indices: DEBUG
logger.org.elasticsearch.gateway: TRACE
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/2717/console is another instance of this.
Again the outermost error is:
07:01:21 # Connection failure to: http://127.0.0.1:9200/_xpack/security/user/kibana/_password?pretty failed: Read timed out
and the error in the Elasticsearch log is:
07:01:21 # [2018-05-25T07:01:13,198][ERROR][o.e.x.m.e.l.LocalExporter] failed to set monitoring pipeline [xpack_monitoring_2]
07:01:21 # org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-pipeline-xpack_monitoring_2) within 30s
07:01:21 # org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-pipeline-xpack_monitoring_2) within 30s
07:01:21 # at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:124) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
07:01:21 # at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:124) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
07:01:21 # at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_144]
07:01:21 # at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_144]
07:01:21 # at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:123) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
07:01:21 # at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:123) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
07:01:21 # at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
07:01:21 # at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
07:01:21 # at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
07:01:21 # at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
07:01:21 # at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
07:01:21 # at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
07:01:21 # at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
07:01:21 # at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Similar story: adding templates is slow:
# [2018-05-25T07:00:43,002][INFO ][o.e.g.GatewayService ] [rWPAvXn] recovered [0] indices into cluster_state
# [2018-05-25T07:00:43,134][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.watches] for index patterns [.watches*]
# [2018-05-25T07:00:43,871][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.watch-history-8] for index patterns [.watcher-history-8*]
# [2018-05-25T07:00:43,134][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.watches] for index patterns [.watches*]
# [2018-05-25T07:00:43,871][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.watch-history-8] for index patterns [.watcher-history-8*]
# [2018-05-25T07:00:44,744][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.triggered_watches] for index patterns [.triggered_watches*]
# [2018-05-25T07:00:49,884][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
# [2018-05-25T07:00:44,744][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.triggered_watches] for index patterns [.triggered_watches*]
# [2018-05-25T07:00:49,884][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
# [2018-05-25T07:00:54,904][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
# [2018-05-25T07:00:59,906][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
# [2018-05-25T07:00:54,904][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
# [2018-05-25T07:00:59,906][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
# [2018-05-25T07:01:04,968][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
# [2018-05-25T07:01:09,960][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
# [2018-05-25T07:01:04,968][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
# [2018-05-25T07:01:09,960][INFO ][o.e.c.m.MetaDataIndexTemplateService] [rWPAvXn] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
# [2018-05-25T07:01:13,010][ERROR][o.e.x.m.e.l.LocalExporter] failed to set monitoring pipeline [xpack_monitoring_6]
@andyb-elastic it doesn't look like we're getting extra detail yet. Could you check these settings are being applied in the right places?
logger.org.elasticsearch.indices: DEBUG
logger.org.elasticsearch.gateway: TRACE
They're not applied because I didn't follow up on this, my bad. I'll add them now
Setting that logging the right way ended up being a little involved, I'll push it next week in the interest of not breaking the build Friday afternoon
This looks very similar to me: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+packaging-tests/890/console
@andyb-elastic not sure which additional detail you are looking for, maybe this one shows more?
I turned on the logging David suggested when the tests run on opensuse https://github.com/elastic/elasticsearch/commit/ba8bb1d4a15fd5d26357ea0cc7c04b28645b50bf
@cbuescher it's likely the same, it's hard to tell since it looks like that one didn't dump the server logs. The details we're looking for is more information about why the server sometimes takes a long time to start on opensuse
@jasontedor can you assign someone to address these test failures?
I think this is caused by internal infrastructure issues. I will follow up internally.
Here's one with the logging turned up. I'm not sure it really revealed anything, or at least I don't see any additional detail on why the template creation is slow.
I turned up org.elasticsearch.indices
logging to TRACE and org.elasticsearch.cluster
to DEBUG https://github.com/elastic/elasticsearch/commit/01d64b128b214c2ac5afc1b0b514dd0d45e60c1b
The log is still suggestive of extremely slow IO when writing updated cluster states to disk. More detail from logger.org.elasticsearch.cluster
is a good move, thanks, and I also opened #31319 because we currently only log the start of the write, not the end.
# [2018-06-13T07:01:12,159][DEBUG][o.e.i.IndicesService ] [eIhOL9F] creating Index [[Uo4Ce-TTSfWnn5cLyKCReQ/ZwtuWREBSMeWp4v-LQ8hrQ]], shards [1]/[0] - reason [create index]
# [2018-06-13T07:01:12,262][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [Uo4Ce-TTSfWnn5cLyKCReQ] closing ... (reason [NO_LONGER_ASSIGNED])
# [2018-06-13T07:01:12,262][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [Uo4Ce-TTSfWnn5cLyKCReQ/ZwtuWREBSMeWp4v-LQ8hrQ] closing index service (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
# [2018-06-13T07:01:12,263][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [Uo4Ce-TTSfWnn5cLyKCReQ/ZwtuWREBSMeWp4v-LQ8hrQ] closed... (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
# [2018-06-13T07:01:12,263][INFO ][o.e.c.m.MetaDataIndexTemplateService] [eIhOL9F] adding template [.triggered_watches] for index patterns [.triggered_watches*]
# [2018-06-13T07:01:12,270][TRACE][o.e.g.MetaStateService ] [eIhOL9F] [_global] writing state, reason [changed]
# [2018-06-13T07:01:16,680][DEBUG][o.e.i.IndicesService ] [eIhOL9F] creating Index [[ngefNOwdS4OHzxLC6TD_Gg/hwwgND-gSp6iMN5daYAsHg]], shards [1]/[0] - reason [create index]
# [2018-06-13T07:01:16,684][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [ngefNOwdS4OHzxLC6TD_Gg] closing ... (reason [N# [2018-06-13T07:01:16,684][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [ngefNOwdS4OHzxLC6TD_Gg] closing ... (reason [N
# [2018-06-13T07:01:16,684][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [ngefNOwdS4OHzxLC6TD_Gg/hwwgND-gSp6iMN5daYAsHg] closing index service (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
# [2018-06-13T07:01:16,684][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [ngefNOwdS4OHzxLC6TD_Gg/hwwgND-gSp6iMN5daYAsHg] closed... (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
# [2018-06-13T07:01:16,684][INFO ][o.e.c.m.MetaDataIndexTemplateService] [eIhOL9F] adding template [.watches] for index patterns [.watches*]
# [2018-06-13T07:01:16,685][TRACE][o.e.g.MetaStateService ] [eIhOL9F] [_global] writing state, reason [changed]
# [2018-06-13T07:01:20,988][DEBUG][o.e.i.IndicesService ] [eIhOL9F] creating Index [[WAeaTyI4TnK3jFmYrKOvAg/En2gqm75SuW_w5EQxYfTog]], shards [1]/[0] - reason [create index]
# [2018-06-13T07:01:21,001][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [WAeaTyI4TnK3jFmYrKOvAg] closing ... (reason [NO_LONGER_ASSIGNED])
# [2018-06-13T07:01:21,001][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [WAeaTyI4TnK3jFmYrKOvAg/En2gqm75SuW_w5EQxYfTog] closing index service (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
# [2018-06-13T07:01:21,001][INFO ][o.e.c.m.MetaDataIndexTemplateService] [eIhOL9F] adding template [.watch-history-8] for index patterns [.watcher-history-8*]
# [2018-06-13T07:01:21,002][TRACE][o.e.g.MetaStateService ] [eIhOL9F] [_global] writing state, reason [changed]
# [2018-06-13T07:01:25,044][DEBUG][o.e.i.IndicesService ] [eIhOL9F] creating Index [[opPILoh1TcuBw9CXG-GNPg/ULbJ2LOERo-GoWAEkGbHJA]], shards [1]/[0] - reason [create index]
# [2018-06-13T07:01:25,050][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [opPILoh1TcuBw9CXG-GNPg] closing ... (reason [NO_LONGER_ASSIGNED])
# [2018-06-13T07:01:25,050][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [opPILoh1TcuBw9CXG-GNPg/ULbJ2LOERo-GoWAEkGbHJA] closing index service (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
# [2018-06-13T07:01:25,050][DEBUG][o.e.i.IndicesService ] [eIhOL9F] [opPILoh1TcuBw9CXG-GNPg/ULbJ2LOERo-GoWAEkGbHJA] closed... (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
# [2018-06-13T07:01:25,050][INFO ][o.e.c.m.MetaDataIndexTemplateService] [eIhOL9F] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
# [2018-06-13T07:01:25,056][TRACE][o.e.g.MetaStateService ] [eIhOL9F] [_global] writing state, reason [changed]
Again! https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+packaging-tests/977/console:
12:59:50 > Task :x-pack:qa:vagrant:vagrantOpensuse42#batsPackagingTest
12:59:50 Task ':x-pack:qa:vagrant:vagrantOpensuse42#batsPackagingTest' is not up-to-date because:
12:59:50 Task has not declared any outputs despite executing actions.
12:59:50 Starting process 'command 'vagrant''. Working directory: /var/lib/jenkins/workspace/elastic+elasticsearch+6.x+packaging-tests/x-pack/qa/vagrant Command: vagrant ssh opensuse-42 --command cd $PACKAGING_ARCHIVES && sudo bats --tap $BATS_TESTS/*.bats
12:59:50 Successfully started process 'command 'vagrant''
12:59:50 1..41
12:59:51 ok 1 [X-PACK] install default distribution
12:59:53 ok 2 [X-PACK] verify x-pack installation
12:59:54 ok 3 [X-PACK] verify croneval works
13:00:01 ok 4 [TAR BOOTSTRAP PASSWORD] add bootstrap.password setting
13:00:11 ok 5 [TAR BOOTSTRAP PASSWORD] test bootstrap.password is in setting list
13:00:13 ok 6 [TAR BOOTSTRAP PASSWORD] test auto generated passwords with modified bootstrap.password
13:00:14 ok 7 [TAR BOOTSTRAP PASSWORD] test elasticsearch-sql-cli
13:00:15 ok 8 [TAR BOOTSTRAP PASSWORD] test elasticsearch-sql-cli when user refuses password
13:00:15 ok 9 [TAR BOOTSTRAP PASSWORD] stop Elasticsearch
13:00:23 ok 10 [RPM BOOTSTRAP PASSWORD] add bootstrap.password setting
13:00:43 ok 11 [RPM BOOTSTRAP PASSWORD] test bootstrap.password is in setting list
13:00:46 ok 12 [RPM BOOTSTRAP PASSWORD] test auto generated passwords with modified bootstrap.password
13:00:47 ok 13 [RPM BOOTSTRAP PASSWORD] test elasticsearch-sql-cli
13:00:48 ok 14 [RPM BOOTSTRAP PASSWORD] test elasticsearch-sql-cli when user refuses password
13:00:49 ok 15 [RPM BOOTSTRAP PASSWORD] stop Elasticsearch
13:02:13 not ok 16 [TAR SETUP PASSWORD] test auto generated passwords
13:02:13 not ok 16 [TAR SETUP PASSWORD] test auto generated passwords
13:02:13 # (in test file /project/build/packaging/bats/tests/30_tar_setup_passwords.bats, line 60)
13:02:13 # (in test file /project/build/packaging/bats/tests/30_tar_setup_passwords.bats, line 60)
13:02:13 # `false' failed
13:02:13 # `false' failed
13:02:13 # Unpacking tarball to /tmp/elasticsearch
13:02:13 # checking for bin file elasticsearch-certgen
13:02:13 # checking for bin file elasticsearch-certgen.bat
13:02:13 # checking for bin file elasticsearch-certutil
13:02:13 # checking for bin file elasticsearch-certutil.bat
13:02:13 # checking for bin file elasticsearch-croneval
13:02:13 # checking for bin file elasticsearch-croneval.bat
13:02:13 # checking for bin file elasticsearch-migrate
13:02:13 # checking for bin file elasticsearch-migrate.bat
13:02:13 # checking for bin file elasticsearch-saml-metadata
13:02:13 # checking for bin file elasticsearch-saml-metadata.bat
13:02:13 # checking for bin file elasticsearch-setup-passwords
13:02:13 # checking for bin file elasticsearch-setup-passwords.bat
13:02:13 # checking for bin file elasticsearch-sql-cli
13:02:13 # checking for bin file elasticsearch-sql-cli.bat
13:02:13 # checking for bin file elasticsearch-sql-cli-6.4.0-SNAPSHOT.jar
13:02:13 # checking for bin file elasticsearch-syskeygen
13:02:13 # checking for bin file elasticsearch-syskeygen.bat
13:02:13 # checking for bin file elasticsearch-users
13:02:13 # checking for bin file elasticsearch-users.bat
13:02:13 # checking for bin file x-pack-env
13:02:13 # checking for bin file x-pack-env.bat
13:02:13 # checking for bin file x-pack-security-env
13:02:13 # checking for bin file x-pack-security-env.bat
13:02:13 # checking for bin file x-pack-watcher-env
13:02:13 # checking for bin file x-pack-watcher-env.bat
13:02:14 # elasticsearch
13:02:14 # elasticsearch.bat
13:02:14 # elasticsearch-certgen
13:02:14 # elasticsearch-certgen.bat
13:02:14 # elasticsearch-certutil
13:02:14 # elasticsearch-certutil.bat
13:02:14 # elasticsearch-cli
13:02:14 # elasticsearch-cli.bat
13:02:14 # elasticsearch-croneval
13:02:14 # elasticsearch-croneval.bat
13:02:14 # elasticsearch-env
13:02:14 # elasticsearch-env.bat
13:02:14 # elasticsearch-keystore
13:02:14 # elasticsearch-keystore.bat
13:02:14 # elasticsearch-migrate
13:02:14 # elasticsearch-migrate.bat
13:02:14 # elasticsearch-plugin
13:02:14 # elasticsearch-plugin.bat
13:02:14 # elasticsearch-saml-metadata
13:02:14 # elasticsearch-saml-metadata.bat
13:02:14 # elasticsearch-service.bat
13:02:14 # elasticsearch-service-mgr.exe
13:02:14 # elasticsearch-service-x64.exe
13:02:14 # elasticsearch-setup-passwords
13:02:14 # elasticsearch-setup-passwords.bat
13:02:14 # elasticsearch-sql-cli
13:02:14 # elasticsearch-sql-cli-6.4.0-SNAPSHOT.jar
13:02:14 # elasticsearch-sql-cli.bat
13:02:14 # elasticsearch-syskeygen
13:02:14 # Unpacking tarball to /tmp/elasticsearch
13:02:14 # checking for bin file elasticsearch-certgen
13:02:14 # checking for bin file elasticsearch-certgen.bat
13:02:14 # checking for bin file elasticsearch-certutil
13:02:14 # checking for bin file elasticsearch-certutil.bat
13:02:14 # checking for bin file elasticsearch-croneval
13:02:14 # checking for bin file elasticsearch-croneval.bat
13:02:14 # checking for bin file elasticsearch-migrate
13:02:14 # checking for bin file elasticsearch-migrate.bat
13:02:14 # checking for bin file elasticsearch-saml-metadata
13:02:14 # checking for bin file elasticsearch-saml-metadata.bat
13:02:14 # checking for bin file elasticsearch-setup-passwords
13:02:14 # checking for bin file elasticsearch-setup-passwords.bat
13:02:14 # checking for bin file elasticsearch-sql-cli
13:02:14 # checking for bin file elasticsearch-sql-cli.bat
13:02:14 # checking for bin file elasticsearch-sql-cli-6.4.0-SNAPSHOT.jar
13:02:14 # checking for bin file elasticsearch-syskeygen
13:02:14 # checking for bin file elasticsearch-syskeygen.bat
13:02:14 # checking for bin file elasticsearch-users
13:02:14 # checking for bin file elasticsearch-users.bat
13:02:14 # checking for bin file x-pack-env
13:02:14 # checking for bin file x-pack-env.bat
13:02:14 # checking for bin file x-pack-security-env
13:02:14 # checking for bin file x-pack-security-env.bat
13:02:14 # checking for bin file x-pack-watcher-env
13:02:14 # checking for bin file x-pack-watcher-env.bat
13:02:14 # elasticsearch
13:02:14 # elasticsearch.bat
13:02:14 # elasticsearch-certgen
13:02:14 # elasticsearch-certgen.bat
13:02:14 # elasticsearch-certutil
13:02:14 # elasticsearch-certutil.bat
13:02:14 # elasticsearch-cli
13:02:14 # elasticsearch-cli.bat
13:02:14 # elasticsearch-croneval
13:02:14 # elasticsearch-croneval.bat
13:02:14 # elasticsearch-env
13:02:14 # elasticsearch-env.bat
13:02:14 # elasticsearch-keystore
13:02:14 # elasticsearch-keystore.bat
13:02:14 # elasticsearch-migrate
13:02:14 # elasticsearch-migrate.bat
13:02:14 # elasticsearch-plugin
13:02:14 # elasticsearch-plugin.bat
13:02:14 # elasticsearch-saml-metadata
13:02:14 # elasticsearch-saml-metadata.bat
13:02:14 # elasticsearch-service.bat
13:02:14 # elasticsearch-service-mgr.exe
13:02:14 # elasticsearch-service-x64.exe
13:02:14 # elasticsearch-setup-passwords
13:02:14 # elasticsearch-setup-passwords.bat
13:02:14 # elasticsearch-sql-cli
13:02:14 # elasticsearch-sql-cli-6.4.0-SNAPSHOT.jar
13:02:14 # elasticsearch-sql-cli.bat
13:02:14 # elasticsearch-syskeygen
13:02:14 # el# el
13:02:14 asticsearch-syskeygen.bat
13:02:14 # elasticsearch-translog
13:02:14 # elasticsearch-translog.bat
13:02:14 # elasticsearch-users
13:02:14 # elasticsearch-users.bat
13:02:14 # x-pack
13:02:14 # x-pack-env
13:02:14 # x-pack-env.bat
13:02:14 # x-pack-security-env
13:02:14 # x-pack-security-env.bat
13:02:14 # x-pack-watcher-env
13:02:14 # x-pack-watcher-env.bat
13:02:14 # Expected x-pack elasticsearch-setup-passwords tool exit code to be zero
13:02:14 # Initiating the setup of passwords for reserved users elastic,kibana,logstash_system,beats_system.
13:02:14 # The passwords will be randomly generated and printed to the console.
13:02:14 # Please confirm that you would like to continue [y/N]
13:02:14 #
13:02:14 #
13:02:14 # Connection failure to: http://127.0.0.1:9200/_xpack/security/user/kibana/_password?pretty failed: Read timed out
13:02:14 #
13:02:14 # ERROR: Failed to set password for user [kibana].
13:02:14 asticsearch-syskeygen.bat
Another instance, this time in 25_package_bootstrap_password.bats
: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/2823/console
Full log: bats-packaging-log.txt
@andyb-elastic do you think we should disable these tests while we investigate? I'm not familiar with how to best isolate + mute the bats tests.
cc @polyfractal since he commented on another one of these issues
We're unable to reproduce this outside of CI so we're collecting logging in the CI runs to debug it. I can disable these tests on opensuse, but I'm not sure how else to get insight into what's happening.
I'll at least leave them enabled on master and disable them on the other branches we test so there aren't so many failures - does that sound reasonable @rjernst @nik9000
Also just to make it easier for people on triage to find this issue here's another way that this can look in the logs
# /project/build/packaging/bats/utils/xpack.bash: connect: Cannot assign requested address
# /project/build/packaging/bats/utils/xpack.bash: line 100: /dev/tcp/localhost/9200: Cannot assign requested address
Sounds reasonable to me.
This happened again in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+packaging-tests/997/console
The full log attached in https://github.com/elastic/elasticsearch/issues/30295#issuecomment-401464157 contains the new trace logging from #31319 and seems to confirm @DaveCTurner's theory from https://github.com/elastic/elasticsearch/issues/30295#issuecomment-397203872. For example, storing cluster state version 2 took 6 seconds:
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.MasterService ] [fqeOyku] cluster state updated, version [2], source [# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.MasterService ] [fqeOyku] cluster state updated, version [2], source [
local-gateway-elected-state]
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.MasterService ] [fqeOyku] publishing cluster state version [2]
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] processing [apply cluster state (from master [master {fqeOyku}{fqeOykunSeWpa5zpqiinkw}{Ospb1N7bSOmKqY9FlOJ6QA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [2] source [local-gateway-elected-state]])]: execute
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] cluster state updated, version [2], source [apply cluster state (from master [master {fqeOyku}{fqeOykunSeWpa5zpqiinkw}{Ospb1N7bSOmKqY9FlOJ6QA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [2] source [local-gateway-elected-state]])]
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] applying cluster state version 2
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] apply cluster state with version 2
# [2018-06-29T15:31:24,287][TRACE][o.e.g.MetaStateService ] [fqeOyku] [_global] writing state, reason [changed]
# [2018-06-29T15:31:28,799][TRACE][o.e.i.IndexingMemoryController] [fqeOyku] total indexing heap bytes used [0b] vs indices.memory.index_buffer_size [99mb], currently writing bytes [0b]
# [2018-06-29T15:31:30,456][TRACE][o.e.g.MetaDataStateFormat] written state to /var/lib/elasticsearch/nodes/0/_state/global-1.st
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaDataStateFormat] cleaned up /var/lib/elasticsearch/nodes/0/_state/global-1.st.tmp
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaDataStateFormat] cleanupOldFiles: cleaning up /var/lib/elasticsearch/nodes/0
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaDataStateFormat] cleanupOldFiles: cleaned up /var/lib/elasticsearch/nodes/0/_state/global-0.st
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaStateService ] [fqeOyku] [_global] state written
local-gateway-elected-state]
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.MasterService ] [fqeOyku] publishing cluster state version [2]
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] processing [apply cluster state (from master [master {fqeOyku}{fqeOykunSeWpa5zpqiinkw}{Ospb1N7bSOmKqY9FlOJ6QA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [2] source [local-gateway-elected-state]])]: execute
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] cluster state updated, version [2], source [apply cluster state (from master [master {fqeOyku}{fqeOykunSeWpa5zpqiinkw}{Ospb1N7bSOmKqY9FlOJ6QA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8370659328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [2] source [local-gateway-elected-state]])]
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] applying cluster state version 2
# [2018-06-29T15:31:24,286][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] apply cluster state with version 2
# [2018-06-29T15:31:24,287][TRACE][o.e.g.MetaStateService ] [fqeOyku] [_global] writing state, reason [changed]
# [2018-06-29T15:31:28,799][TRACE][o.e.i.IndexingMemoryController] [fqeOyku] total indexing heap bytes used [0b] vs indices.memory.index_buffer_size [99mb], currently writing bytes [0b]
# [2018-06-29T15:31:30,456][TRACE][o.e.g.MetaDataStateFormat] written state to /var/lib/elasticsearch/nodes/0/_state/global-1.st
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaDataStateFormat] cleaned up /var/lib/elasticsearch/nodes/0/_state/global-1.st.tmp
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaDataStateFormat] cleanupOldFiles: cleaning up /var/lib/elasticsearch/nodes/0
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaDataStateFormat] cleanupOldFiles: cleaned up /var/lib/elasticsearch/nodes/0/_state/global-0.st
# [2018-06-29T15:31:30,457][TRACE][o.e.g.MetaStateService ] [fqeOyku] [_global] state written
# [# [
2018-06-29T15:31:30,457][DEBUG][o.e.c.s.ClusterApplierService] [fqeOyku] set locally applied cluster state to version 2
I found https://www.electricmonk.nl/log/2016/03/14/terrible-virtualbox-disk-performance/, which recommends enabling the "Host I/O Cache" for better SATA disk performance in VirtualBox.
I wonder if this is also relevant: https://forums.virtualbox.org/viewtopic.php?f=1&t=61428
It claims that IDE controllers are much faster than SATA controllers in VirtualBox. However, as pointed out in the comments, the main reason could be that IDE controllers have Host I/O Cache enabled by default and SATA controllers don't (which tallies with what the first link says). The reason for that decision seems to be the risk of data loss in the VM if there's a power failure on the host, but we don't care about that in packaging tests: http://www.virtualbox.org/manual/ch05.html#iocaching
Interestingly the VM used for the opensuse-42 VM has a SATA controller with Host I/O Cache switched off:
<StorageControllers>
<StorageController name="IDE Controller" type="PIIX4" PortCount="2" useHostIOCache="true" Bootable="true"/>
<StorageController name="SATA Controller" type="AHCI" PortCount="1" useHostIOCache="false" Bootable="true" IDE0MasterEmulationPort="0" IDE0SlaveEmulationPort="1" IDE1MasterEmulationPort="2" IDE1SlaveEmulationPort="3">
<AttachedDevice type="HardDisk" hotpluggable="false" port="0" device="0">
<Image uuid="{cb0669e5-d742-4549-a85b-1be6ab5ca55c}"/>
</AttachedDevice>
</StorageController>
</StorageControllers>
In contrast the centos-6 VM which doesn't suffer from slow writes only has an IDE controller with Host I/O Cache enabled - it has no SATA controller:
<StorageControllers>
<StorageController name="IDE Controller" type="PIIX4" PortCount="2" useHostIOCache="true" Bootable="true">
<AttachedDevice type="HardDisk" hotpluggable="false" port="0" device="0">
<Image uuid="{bda286eb-5d96-4970-b89c-3b7dd570fde8}"/>
</AttachedDevice>
</StorageController>
</StorageControllers>
This is all just an unsubstantiated theory at the moment. I'm not sure how to change these low level VirtualBox settings from within the Vagrant config file.
I disabled these tests on opensuse and sles on 6.x, 6.3, and 5.6 https://github.com/elastic/elasticsearch/pull/31749
Thanks for the info @droberts195 that looks promising, I'll try setting the controller in the Vagrantfile
It seems like configuring the disk to be attached to the ide controller would be more appropriate to do when the image is built, it looks like we do something similar for sles https://github.com/elastic/infra/blob/a8344443005bf72d8b674e12583e0d5c7a338d6a/packer/packer-sles-vagrantcloud.json#L137-L148
I ran into an issue with the packer build for the opensuse vbox image, so that may need to be resolved first. If it's tricky I think we can also set it by calling storageattach
in the Vagrantfile https://github.com/elastic/infra/issues/5852
A similar failure of this today: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+packaging-tests/1024/console
03:45:36 Starting process 'command 'vagrant''. Working directory: /var/lib/jenkins/workspace/elastic+elasticsearch+6.x+packaging-tests/qa/vagrant Command: vagrant ssh opensuse-42 --command sudo bash "$PACKAGING_TESTS/run-tests.sh"
03:45:36 Successfully started process 'command 'vagrant''
03:45:36 JUnit version 4.12
03:49:43 .....E..E.E........................
03:49:43 Time: 247.113
03:49:43 There were 3 failures:
03:49:43 1) test50StartAndStop(org.elasticsearch.packaging.test.DefaultTarTests)
03:49:43 java.net.SocketTimeoutException: Read timed out
03:49:43 at java.net.SocketInputStream.socketRead0(Native Method)
03:49:43 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
03:49:43 at java.net.SocketInputStream.read(SocketInputStream.java:171)
03:49:43 at java.net.SocketInputStream.read(SocketInputStream.java:141)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
03:49:43 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
03:49:43 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
03:49:43 at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
03:49:43 at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
03:49:43 at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
03:49:43 at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
03:49:43 at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
03:49:43 at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
03:49:43 at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
03:49:43 at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
03:49:43 at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
03:49:43 at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
03:49:43 at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
03:49:43 at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
03:49:43 at org.apache.http.client.fluent.Request.internalExecute(Request.java:173)
03:49:43 at org.apache.http.client.fluent.Request.execute(Request.java:177)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.makeRequest(ServerUtils.java:113)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.runElasticsearchTests(ServerUtils.java:98)
03:49:43 at org.elasticsearch.packaging.test.ArchiveTestCase.test50StartAndStop(ArchiveTestCase.java:196)
03:49:43 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
03:49:43 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
03:49:43 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
03:49:43 at java.lang.reflect.Method.invoke(Method.java:498)
03:49:43 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
03:49:43 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
03:49:43 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
03:49:43 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
03:49:43 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
03:49:43 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
03:49:43 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
03:49:43 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:128)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:27)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:128)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:27)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
03:49:43 at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
03:49:43 at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
03:49:43 at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
03:49:43 at org.elasticsearch.packaging.VMTestRunner.main(VMTestRunner.java:34)
03:49:43 2) test70CustomPathConfAndJvmOptions(org.elasticsearch.packaging.test.DefaultTarTests)
03:49:43 java.net.SocketTimeoutException: Read timed out
03:49:43 at java.net.SocketInputStream.socketRead0(Native Method)
03:49:43 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
03:49:43 at java.net.SocketInputStream.read(SocketInputStream.java:171)
03:49:43 at java.net.SocketInputStream.read(SocketInputStream.java:141)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
03:49:43 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
03:49:43 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
03:49:43 at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
03:49:43 at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
03:49:43 at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
03:49:43 at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
03:49:43 at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
03:49:43 at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
03:49:43 at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
03:49:43 at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
03:49:43 at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
03:49:43 at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
03:49:43 at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
03:49:43 at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
03:49:43 at org.apache.http.client.fluent.Request.internalExecute(Request.java:173)
03:49:43 at org.apache.http.client.fluent.Request.execute(Request.java:177)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.makeRequest(ServerUtils.java:113)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.waitForElasticsearch(ServerUtils.java:93)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.waitForElasticsearch(ServerUtils.java:45)
03:49:43 at org.elasticsearch.packaging.util.Archives.runElasticsearch(Archives.java:302)
03:49:43 at org.elasticsearch.packaging.test.ArchiveTestCase.test70CustomPathConfAndJvmOptions(ArchiveTestCase.java:258)
03:49:43 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
03:49:43 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
03:49:43 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
03:49:43 at java.lang.reflect.Method.invoke(Method.java:498)
03:49:43 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
03:49:43 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
03:49:43 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
03:49:43 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
03:49:43 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
03:49:43 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
03:49:43 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
03:49:43 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:128)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:27)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:128)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:27)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
03:49:43 at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
03:49:43 at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
03:49:43 at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
03:49:43 at org.elasticsearch.packaging.VMTestRunner.main(VMTestRunner.java:34)
03:49:43 3) test80RelativePathConf(org.elasticsearch.packaging.test.DefaultTarTests)
03:49:43 java.net.SocketTimeoutException: Read timed out
03:49:43 at java.net.SocketInputStream.socketRead0(Native Method)
03:49:43 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
03:49:43 at java.net.SocketInputStream.read(SocketInputStream.java:171)
03:49:43 at java.net.SocketInputStream.read(SocketInputStream.java:141)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
03:49:43 at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
03:49:43 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
03:49:43 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
03:49:43 at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
03:49:43 at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
03:49:43 at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
03:49:43 at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
03:49:43 at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
03:49:43 at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
03:49:43 at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
03:49:43 at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
03:49:43 at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
03:49:43 at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
03:49:43 at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
03:49:43 at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
03:49:43 at org.apache.http.client.fluent.Request.internalExecute(Request.java:173)
03:49:43 at org.apache.http.client.fluent.Request.execute(Request.java:177)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.makeRequest(ServerUtils.java:113)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.waitForElasticsearch(ServerUtils.java:93)
03:49:43 at org.elasticsearch.packaging.util.ServerUtils.waitForElasticsearch(ServerUtils.java:45)
03:49:43 at org.elasticsearch.packaging.util.Archives.runElasticsearch(Archives.java:302)
03:49:43 at org.elasticsearch.packaging.test.ArchiveTestCase.test80RelativePathConf(ArchiveTestCase.java:303)
03:49:43 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
03:49:43 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
03:49:43 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
03:49:43 at java.lang.reflect.Method.invoke(Method.java:498)
03:49:43 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
03:49:43 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
03:49:43 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
03:49:43 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
03:49:43 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
03:49:43 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
03:49:43 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
03:49:43 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:128)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:27)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:128)
03:49:43 at org.junit.runners.Suite.runChild(Suite.java:27)
03:49:43 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
03:49:43 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
03:49:43 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
03:49:43 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
03:49:43 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
03:49:43 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
03:49:43 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
03:49:43 at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
03:49:43 at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
03:49:43 at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
03:49:43 at org.elasticsearch.packaging.VMTestRunner.main(VMTestRunner.java:34)
03:49:43
This build failure failed differently then the others reported here, but looking in the logs a lot of things time out, so assume it is caused by the same issue (super slow IO): https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/2860/console
23:47:43 # Testing if ES ports are open:
23:47:43 # port 9200 on host 127.0.0.1 is open
23:47:43 # /project/build/packaging/bats/utils/utils.bash: connect: Connection refused
23:47:43 # Testing if ES ports are open:
23:47:43 # port 9200 on host 127.0.0.1 is open
23:47:43 # /project/build/packaging/bats/utils/utils.bash: connect: Connection refused
23:47:43 # /project/build/packaging/bats/utils/utils.bash: line 428: /dev/tcp/127.0.0.1/9201: Connection refused
23:47:43 # /project/build/packaging/bats/utils/utils.bash: line 428: /dev/tcp/127.0.0.1/9201: Connection refused
23:47:43 # port 9201 on host 127.0.0.1 is not open
23:47:43 # port 9201 on host 127.0.0.1 is not open
23:47:57 not ok 13 [RPM BOOTSTRAP PASSWORD] test elasticsearch-sql-cli
23:47:57 not ok 13 [RPM BOOTSTRAP PASSWORD] test elasticsearch-sql-cli
23:47:57 # (in test file /project/build/packaging/bats/tests/25_package_bootstrap_password.bats, line 146)
23:47:57 # (in test file /project/build/packaging/bats/tests/25_package_bootstrap_password.bats, line 146)
23:47:57 # `false' failed
23:47:57 # `false' failed
23:47:57 # {
23:47:57 # "error" : {
23:47:57 # "root_cause" : [
23:47:57 # {
23:47:57 # "type" : "security_exception",
23:47:57 # "reason" : "failed to authenticate user [elastic]",
23:47:57 # "header" : {
23:47:57 # "WWW-Authenticate" : "Basic realm=\"security\" charset=\"UTF-8\""
23:47:57 # }
23:47:57 # }
23:47:57 # ],
23:47:57 # "type" : "security_exception",
23:47:57 # "reason" : "failed to authenticate user [elastic]",
23:47:57 # "header" : {
23:47:57 # "WWW-Authenticate" : "Basic realm=\"security\" charset=\"UTF-8\""
23:47:57 # }
23:47:57 # },
23:47:57 # "status" : 401
23:47:57 # }
23:47:57 # SQL cli failed:\nJul 11, 2018 11:47:51 PM org.jline.utils.Log logr
23:47:57 # WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
23:47:57 # password: Client Exception [java.sql.SQLInvalidAuthorizationSpecException: failed to authenticate user [elastic]]
23:47:57 # org.elasticsearch.xpack.sql.client.ClientException: java.sql.SQLInvalidAuthorizationSpecException: failed to authenticate user [elastic]
23:47:57 # at org.elasticsearch.xpack.sql.cli.command.CliSession.checkConnection(CliSession.java:65)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.checkConnection(Cli.java:130)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.execute(Cli.java:120)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.execute(Cli.java:102)
23:47:57 # at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
23:47:57 # at org.elasticsearch.cli.Command.main(Command.java:90)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.main(Cli.java:50)
23:47:57 # Caused by: java.sql.SQLInvalidAuthorizationSpecException: failed to authenticate user [elastic]
23:47:57 # at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection$SqlExceptionType.asException(JreHttpUrlConnection.java:306)
23:47:57 # at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection.parserError(JreHttpUrlConnection.java:183)
23:47:57 # at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection.request(JreHttpUrlConnection.java:158)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.lambda$get$4(HttpClient.java:122)
23:47:57 # {
23:47:57 # "error" : {
23:47:57 # "root_cause" : [
23:47:57 # {
23:47:57 # "type" : "security_exception",
23:47:57 # "reason" : "failed to authenticate user [elastic]",
23:47:57 # "header" : {
23:47:57 # "WWW-Authenticate" : "Basic realm=\"security\" charset=\"UTF-8\""
23:47:57 # }
23:47:57 # }
23:47:57 # ],
23:47:57 # "type" : "security_exception",
23:47:57 # "reason" : "failed to authenticate user [elastic]",
23:47:57 # "header" : {
23:47:57 # "WWW-Authenticate" : "Basic realm=\"security\" charset=\"UTF-8\""
23:47:57 # }
23:47:57 # },
23:47:57 # "status" : 401
23:47:57 # }
23:47:57 # SQL cli failed:\nJul 11, 2018 11:47:51 PM org.jline.utils.Log logr
23:47:57 # WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
23:47:57 # password: Client Exception [java.sql.SQLInvalidAuthorizationSpecException: failed to authenticate user [elastic]]
23:47:57 # org.elasticsearch.xpack.sql.client.ClientException: java.sql.SQLInvalidAuthorizationSpecException: failed to authenticate user [elastic]
23:47:57 # at org.elasticsearch.xpack.sql.cli.command.CliSession.checkConnection(CliSession.java:65)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.checkConnection(Cli.java:130)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.execute(Cli.java:120)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.execute(Cli.java:102)
23:47:57 # at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
23:47:57 # at org.elasticsearch.cli.Command.main(Command.java:90)
23:47:57 # at org.elasticsearch.xpack.sql.cli.Cli.main(Cli.java:50)
23:47:57 # Caused by: java.sql.SQLInvalidAuthorizationSpecException: failed to authenticate user [elastic]
23:47:57 # at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection$SqlExceptionType.asException(JreHttpUrlConnection.java:306)
23:47:57 # at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection.parserError(JreHttpUrlConnection.java:183)
23:47:57 # at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection.request(JreHttpUrlConnection.java:158)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.lambda$get$4(HttpClient.java:122)
23:47:57 # at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection.http(JreHttpUrlConnection.java# at org.elasticsearch.xpack.sql.client.JreHttpUrlConnection.http(JreHttpUrlConnection.java
23:47:57 :62)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.lambda$get$5(HttpClient.java:121)
23:47:57 # at java.security.AccessController.doPrivileged(Native Method)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.get(HttpClient.java:120)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.serverInfo(HttpClient.java:60)
23:47:57 # at org.elasticsearch.xpack.sql.cli.command.CliSession.checkConnection(CliSession.java:63)
23:47:57 # ... 6 more
23:47:57 # ERROR: Cannot communicate with the server http://elastic@127.0.0.1:9200. This version of CLI only works with Elasticsearch version v7.0.0-alpha1 [e955ffc]
23:47:57 :62)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.lambda$get$5(HttpClient.java:121)
23:47:57 # at java.security.AccessController.doPrivileged(Native Method)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.get(HttpClient.java:120)
23:47:57 # at org.elasticsearch.xpack.sql.client.HttpClient.serverInfo(HttpClient.java:60)
23:47:57 # at org.elasticsearch.xpack.sql.cli.command.CliSession.checkConnection(CliSession.java:63)
23:47:57 # ... 6 more
23:47:57 # ERROR: Cannot communicate with the server http://elastic@127.0.0.1:9200. This version of CLI only works with Elasticsearch version v7.0.0-alpha1 [e955ffc]
Been at this for a while - I'm not sure swapping the disk to the other controller will work in the Vagrantfile. The commands to do it are basically
# disconnect the disk from the sata controller
vboxmanage storageattach [machine id] \
--storagectl 'SATA Controller' \
--device 0 \
--port 0 \
--type hdd \
--medium none
# attach the disk to the ide controller
vboxmanage storageattach [machine id] \
--storagectl 'IDE Controller' \
--device 0 \
--port 0 \
--type hdd \
--medium [disk path or uuid]
You need to know the disk path or uuid, and to get that from vboxmanage showvminfo
you need to know the machine id, which doesn't exist at the point where you can run vboxmanage showvminfo
, because the machine hasn't been created.
I'm going to take another shot at doing this in the packer build
I merged #32053 to turn on host io caching for the opensuse box and re-enabled the tests for the suse boxes on master 14d7e2c7b2d3761a361edd720f98353f856936a4
We'll see how they do on the opensuse box - I'm not sure where to start on the sles box as it has its disk on the ide controller. It's possible that it's unrelated to what we're seeing here
Infra merged https://github.com/elastic/infra/pull/5975 and a new version of the box with the disk attached to the ide controller has landed in vagrant cloud
I reverted enabling host io caching for the sata controller (https://github.com/elastic/elasticsearch/commit/16fe22047e4ce243e9858da568b4e7a8eb73ef35) and turned tests back on for the suse boxes (https://github.com/elastic/elasticsearch/commit/418540e630280d7e4f899ebff2f5202295ac0de0)
This one timed out when waiting for the server to start - while we don't have the es logs to confirm it's this same slow io problem, it's not encouraging
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/2936/console
Have logs from a bats run now, we're definitely still seeing the issue. I disabled the tests for these boxes again https://github.com/elastic/elasticsearch/commit/0b601c663178dcdd9a7361dc193317c3817251eb
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/2954/console build-2954.txt
I'm re-enabling tests on these boxes on master to confirm we're still seeing this issue in CI, while I dig into it a little more. It did not occur in three PR CI runs in #38864, and I was not able to reproduce it locally still
To re-disable tests on these boxes, you can either revert 94a4c3227bff5aa19fd859c1541dfab1790b066d or cherry-pick 0b601c663178dcdd9a7361dc193317c3817251eb and push it to the branch again, they should have the same effect
Failed again in the periodic job https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/454/console
build-master-packaging-454.txt.zip
Disabled again b88760b704ba7a80ecb550b6c360d2b340210e4c
Same symptom, 4 seconds to publish cluster state
# {"type": "server", "timestamp": "2019-02-15T13:46:28,387+0000", "level": "DEBUG", "component": "o.e.c.s.MasterService", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "publishing cluster state version [2]" }
# {"type": "server", "timestamp": "2019-02-15T13:46:28,387+0000", "level": "DEBUG", "component": "o.e.c.s.MasterService", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "publishing cluster state version [2]" }
# {"type": "server", "timestamp": "2019-02-15T13:46:28,388+0000", "level": "TRACE", "component": "o.e.g.MetaStateService", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "[_meta] writing state, reason [changed]" }
# {"type": "server", "timestamp": "2019-02-15T13:46:28,388+0000", "level": "TRACE", "component": "o.e.g.MetaStateService", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "[_meta] writing state, reason [changed]" }
# {"type": "server", "timestamp": "2019-02-15T13:46:28,388+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleaned up /var/lib/elasticsearch/nodes/0/manifest-6.st.tmp" }
# {"type": "server", "timestamp": "2019-02-15T13:46:28,388+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleaned up /var/lib/elasticsearch/nodes/0/manifest-6.st.tmp" }
# {"type": "server", "timestamp": "2019-02-15T13:46:31,013+0000", "level": "TRACE", "component": "o.e.i.IndexingMemoryController", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "total indexing heap bytes used [0b] vs indices.memory.index_buffer_size [99mb], currently writing bytes [0b]" }
# {"type": "server", "timestamp": "2019-02-15T13:46:31,013+0000", "level": "TRACE", "component": "o.e.i.IndexingMemoryController", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "total indexing heap bytes used [0b] vs indices.memory.index_buffer_size [99mb], currently writing bytes [0b]" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleaned up /var/lib/elasticsearch/nodes/0/manifest-6.st.tmp" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleaned up /var/lib/elasticsearch/nodes/0/manifest-6.st.tmp" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleanupOldFiles: cleaning up /var/lib/elasticsearch/nodes/0" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleanupOldFiles: cleaning up /var/lib/elasticsearch/nodes/0" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleaned up /var/lib/elasticsearch/nodes/0/_state/manifest-5.st" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaDataStateFormat", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "cleaned up /var/lib/elasticsearch/nodes/0/_state/manifest-5.st" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaStateService", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "[_meta] state written (generation: 6)" }
# {"type": "server", "timestamp": "2019-02-15T13:46:32,912+0000", "level": "TRACE", "component": "o.e.g.MetaStateService", "cluster.name": "elasticsearch", "node.name": "opensuse-leap", "cluster.uuid": "7t2z5K-BRU6g62p6qkuX3Q", "node.id": "3FfHA4InSKasjqnUYx4oMw", "message": "[_meta] state written (generation: 6)" }
I did some testing to look at the disk usage of the opensuse image, while I'm not sure it's conclusive it does seem consistent with our theory here that the disk is causing an issue
This was running the geonames rally track on a single node and monitoring with iostat at a 1 second interval, with both the node and rally running inside a vm image. The node was started like we start it in the packaging tests, e.g. rally did not manage it. The conditions I tested were using the centos-7 image vs the opensuse-42 image, with or without added stress on the host machine (where stress was an elasticsearch gradle check
build running at the same time). This was hosted on my macbook, not a CI machine, using 6.6 (there was an issue with rally's use of typeless APIs on master at the time)
(these plots are truncated to an x range where most of the activity is, the time range after this doesn't have much going on)
So the opensuse runs had a lot more merged read and write requests queued to the disk per second than the centos runs. Strangely though the average queue size looks similar for both
Looking at average service time, the opensuse runs seem spikier but again not super qualitatively different
The merged r/w requests queued per second bit seems the most interesting, but I'm not very familiar with how that queueing works so I'll look into it more
Data these plots are from
opensuse-nostress-iostat-docs-tagged.json.txt centos-stress-iostat-docs-tagged.json.txt opensuse-nostress-iostat-docs-tagged.json.txt opensuse-stress-iostat-docs-tagged.json.txt
Both opensuse and sles were re-enabled with a recent refactoring of the packaging tests, and we haven't (yet) seen this issue crop up. Additionally, we are moving to each OS running within GCP instead of vagrant (at least for CI). I'm going to optimistically close this again.
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+packaging-tests/2645/console
REPRODUCE WITH line says:
./gradlew :x-pack:qa:vagrant:vagrantOpensuse42#batsPackagingTest -Dtests.seed=7E7AE0563BF3B4A3
The main problem seems to be
Unfortunately I can't find the server logs so can't see any more detail than this. Here's the relevant bit of the CI log: