Open cbuescher opened 2 years ago
Pinging @elastic/es-security (Team:Security)
I'm marking this as FIPS just because of this line in the logs:
» [2022-05-25T10:05:36,675][ERROR][o.e.x.c.s.SSLService ] [yamlRestTestV7CompatTest-0] unsupported ciphers [[TLS_AES_256_GCM_SHA384, TLS_AES_128_GCM_SHA256, TLS_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384]] were requested but cannot be used in this JVM, however there are supported ciphers that will be used [[TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA]]. If you are trying to use ciphers with a key length greater than 128 bits on an Oracle JVM, you will need to install the unlimited strength JCE policy files.
I don't know if that is enough to explain the cluster not booting up under 120000ms.
I don't think that the logged error is the actual cause. It looks like the node just took longer to start.
» [2022-05-25T10:03:37,564][INFO ][o.e.n.Node ] [yamlRestTestV7CompatTest-0] node name [yamlRestTestV7CompatTest-0], node ID [6ctzuUuKRYqC6USwGww_Rw], cluster name [yamlRestTestV7CompatTest], roles [ingest, data_frozen, ml, data_hot, transform, data_content, data_warm, master, remote_cluster_client, data, data_cold]
» [2022-05-25T10:05:37,689][INFO ][o.e.x.s.Security ] [yamlRestTestV7CompatTest-0] Security is enabled
» [2022-05-25T10:05:37,968][INFO ][o.e.x.s.a.s.FileRolesStore] [yamlRestTestV7CompatTest-0] parsed [1] roles from file [/dev/shm/elastic+elasticsearch+main+periodic+java-fips-matrix/x-pack/plugin/stack/qa/rest/build/testclusters/yamlRestTestV7CompatTest-0/config/roles.yml]
» [2022-05-25T10:05:39,406][INFO ][o.e.t.n.NettyAllocator ] [yamlRestTestV7CompatTest-0] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=1mb, factors={es.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=4mb, heap_size=512mb}]
» [2022-05-25T10:05:39,457][INFO ][o.e.i.r.RecoverySettings ] [yamlRestTestV7CompatTest-0] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]
» [2022-05-25T10:05:39,522][INFO ][o.e.d.DiscoveryModule ] [yamlRestTestV7CompatTest-0] using discovery type [multi-node] and seed hosts providers [settings, file]
» [2022-05-25T10:05:40.730410371Z] [BUILD] Stopping node
» [2022-05-25T10:05:40,752][INFO ][o.e.n.Node ] [yamlRestTestV7CompatTest-0] initialized
» [2022-05-25T10:05:40,753][INFO ][o.e.n.Node ] [yamlRestTestV7CompatTest-0] starting ...
There are 2 minutes delay between node info message and security enabled message.
[2022-05-25T10:03:37,564][INFO ][o.e.n.Node ]
[2022-05-25T10:05:37,689][INFO ][o.e.x.s.Security ]
The node started immediately after the build initiated stop. Looking at the history of failures it does seem that it mostly happens in FIPS mode. My best guess is that it depends on a number of loaded modules which caused the initialization to sometime take longer.
CI Link
https://gradle-enterprise.elastic.co/s/32omqddh3kule
Repro line
none
Does it reproduce?
No
Applicable branches
main
Failure history
-
Failure excerpt