apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.24k stars 1.03k forks source link

CouchDB fails to start due to unacceptable_rsa_key after upgrading the erlang version #5140

Closed IBMRob closed 3 months ago

IBMRob commented 3 months ago

Description

We had a working pair of builds which was producing couchdb images that worked without issue. Due to https://repo.hex.pm/builds/elixir/v1.17.2-otp-24.zip being removed we had to upgrade to erlangversion=25.3.2.13. After this our builder image all builds successfully and then our couchDb image builds sucessfully but when we try and start our couchdb container its throwing the following errors on startup

[error] 2024-07-17T16:45:02.403256Z couchdb@c-couchdb-m-0.c-couchdb-m <0.370.0> -------- application: mochiweb, "Accept failed error", "{error,{tls_alert,{handshake_failure,\"TLS server: In state hello at ssl_handshake.erl:2121 generated SERVER ALERT: Fatal - Handshake Failure\\n unacceptable_rsa_key\"}}}"
[error] 2024-07-17T16:45:02.403340Z couchdb@c-couchdb-m-0.c-couchdb-m <0.370.0> -------- application: mochiweb, "Accept failed error", "{error,{tls_alert,{handshake_failure,\"TLS server: In state hello at ssl_handshake.erl:2121 generated SERVER ALERT: Fatal - Handshake Failure\\n unacceptable_rsa_key\"}}}"
[error] 2024-07-17T16:45:02.403880Z couchdb@c-couchdb-m-0.c-couchdb-m <0.370.0> -------- CRASH REPORT Process  (<0.370.0>) with 0 neighbors exited with reason: {error,accept_failed} at mochiweb_acceptor:init/4(line:71) <= proc_lib:init_p_do_apply/3(line:240); initial_call: {mochiweb_acceptor,init,['Argument__1','Argument__2',...]}, ancestors: [https,couch_secondary_services,couch_sup,<0.254.0>], message_queue_len: 0, links: [<0.367.0>], dictionary: [], trap_exit: false, status: running, heap_size: 2586, stack_size: 28, reductions: 2741
[error] 2024-07-17T16:45:02.403978Z couchdb@c-couchdb-m-0.c-couchdb-m <0.370.0> -------- CRASH REPORT Process  (<0.370.0>) with 0 neighbors exited with reason: {error,accept_failed} at mochiweb_acceptor:init/4(line:71) <= proc_lib:init_p_do_apply/3(line:240); initial_call: {mochiweb_acceptor,init,['Argument__1','Argument__2',...]}, ancestors: [https,couch_secondary_services,couch_sup,<0.254.0>], message_queue_len: 0, links: [<0.367.0>], dictionary: [], trap_exit: false, status: running, heap_size: 2586, stack_size: 28, reductions: 2741
[error] 2024-07-17T16:45:04.439098Z couchdb@c-couchdb-m-0.c-couchdb-m <0.371.0> -------- application: mochiweb, "Accept failed error", "{error,{tls_alert,{handshake_failure,\"TLS server: In state hello at ssl_handshake.erl:2121 generated SERVER ALERT: Fatal - Handshake Failure\\n unacceptable_rsa_key\"}}}"
[error] 2024-07-17T16:45:04.439145Z couchdb@c-couchdb-m-0.c-couchdb-m <0.371.0> -------- application: mochiweb, "Accept failed error", "{error,{tls_alert,{handshake_failure,\"TLS server: In state hello at ssl_handshake.erl:2121 generated SERVER ALERT: Fatal - Handshake Failure\\n unacceptable_rsa_key\"}}}"
[error] 2024-07-17T16:45:04.439523Z couchdb@c-couchdb-m-0.c-couchdb-m <0.371.0> -------- CRASH REPORT Process  (<0.371.0>) with 0 neighbors exited with reason: {error,accept_failed} at mochiweb_acceptor:init/4(line:71) <= proc_lib:init_p_do_apply/3(line:240); initial_call: {mochiweb_acceptor,init,['Argument__1','Argument__2',...]}, ancestors: [https,couch_secondary_services,couch_sup,<0.254.0>], message_queue_len: 0, links: [<0.367.0>], dictionary: [], trap_exit: false, status: running, heap_size: 1598, stack_size: 28, reductions: 2782
[error] 2024-07-17T16:45:04.439678Z couchdb@c-couchdb-m-0.c-couchdb-m <0.371.0> -------- CRASH REPORT Process  (<0.371.0>) with 0 neighbors exited with reason: {error,accept_failed} at mochiweb_acceptor:init/4(line:71) <= proc_lib:init_p_do_apply/3(line:240); initial_call: {mochiweb_acceptor,init,['Argument__1','Argument__2',...]}, ancestors: [https,couch_secondary_services,couch_sup,<0.254.0>], message_queue_len: 0, links: [<0.367.0>], dictionary: [], trap_exit: false, status: running, heap_size: 1598, stack_size: 28, reductions: 2782
[error] 2024-07-17T16:45:06.325934Z couchdb@c-couchdb-m-0.c-couchdb-m <0.372.0> -------- application: mochiweb, "Accept failed error", "{error,{tls_alert,{handshake_failure,\"TLS server: In state hello at ssl_handshake.erl:2121 generated SERVER ALERT: Fatal - Handshake Failure\\n unacceptable_rsa_key\"}}}"
[error] 2024-07-17T16:45:06.325981Z couchdb@c-couchdb-m-0.c-couchdb-m <0.372.0> -------- application: mochiweb, "Accept failed error", "{error,{tls_alert,{handshake_failure,\"TLS server: In state hello at ssl_handshake.erl:2121 generated SERVER ALERT: Fatal - Handshake Failure\\n unacceptable_rsa_key\"}}}"
[error] 2024-07-17T16:45:06.326326Z couchdb@c-couchdb-m-0.c-couchdb-m <0.372.0> -------- CRASH REPORT Process  (<0.372.0>) with 0 neighbors exited with reason: {error,accept_failed} at mochiweb_acceptor:init/4(line:71) <= proc_lib:init_p_do_apply/3(line:240); initial_call: {mochiweb_acceptor,init,['Argument__1','Argument__2',...]}, ancestors: [https,couch_secondary_services,couch_sup,<0.254.0>], message_queue_len: 0, links: [<0.367.0>], dictionary: [], trap_exit: false, status: running, heap_size: 1598, stack_size: 28, reductions: 2782
[error] 2024-07-17T16:45:06.326447Z couchdb@c-couchdb-m-0.c-couchdb-m <0.372.0> -------- CRASH REPORT Process  (<0.372.0>) with 0 neighbors exited with reason: {error,accept_failed} at mochiweb_acceptor:init/4(line:71) <= proc_lib:init_p_do_apply/3(line:240); initial_call: {mochiweb_acceptor,init,['Argument__1','Argument__2',...]}, ancestors: [https,couch_secondary_services,couch_sup,<0.254.0>], message_queue_len: 0, links: [<0.367.0>], dictionary: [], trap_exit: false, status: running, heap_size: 1598, stack_size: 28, reductions: 2782

Using an old image works so we have confirmed its not due to any other changes.

Steps to Reproduce

This is a custom couchdb image so its hard to provide re-create steps although we are essentially based around the same couchdb-ci and couchdb repos

The main non standard configuration we have is that we have CouchDB configured with TLS

[ssl]
enable = true
key_file = /cert/tls.key
cert_file = /cert/tls.crt
tls_versions = ['tlsv1.2']
cacert_file = /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
ciphers = ["ECDHE-ECDSA-AES128-GCM-SHA256", "ECDHE-ECDSA-AES256-GCM-SHA384", "ECDHE-RSA-AES128-GCM-SHA256", "ECDHE-RSA-AES256-GCM-SHA384", "DHE-RSA-AES128-GCM-SHA256", "DHE-RSA-AES256-GCM-SHA384"]

The certs we use are generated by Openshift via the service annotation certificate mechanism - service.alpha.openshift.io/serving-cert-secret-name

Expected Behaviour

cluster starts and setup can be completed on it.

Your Environment

We see this on our amd64, s390x and ppc64le builds

Output of root

{"couchdb":"Welcome","version":"3.3.3","git_sha":"40afbcfc7","uuid":"b17f4e4e0fed4eecb9725420bdab2e43","features":["access-ready","fips","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"IBM"}}

Config

sh-4.4$ cat vm.args 
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy of
# the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations under
# the License.

# Ensure that the Erlang VM listens on a known port
-kernel inet_dist_listen_min 9100
-kernel inet_dist_listen_max 9100

# Enable FIPS
-crypto fips_mode true

# Tell kernel and SASL not to log anything
-kernel error_logger silent
-sasl sasl_error_logger false

# Use kernel poll functionality if supported by emulator
+K true

# Start a pool of asynchronous IO threads
+A 16

# Comment this line out to enable the interactive Erlang shell on startup
+Bd -noinput

Additional Context

big-r81 commented 3 months ago

Just a first guess, can you try with fips mode disabled in vm.args?

IBMRob commented 3 months ago

Looks like if fips is disabled then it does start up so the problem is associated with fips being enabled

big-r81 commented 3 months ago

I think it would be good to report this directly to the Erlang/OTP Team to debug this further regarding fips mode.

nickva commented 3 months ago

There may be a related issue in OTP https://github.com/erlang/otp/issues/8562

IBMRob commented 3 months ago

There may be a related issue in OTP erlang/otp#8562

That does look very similar

big-r81 commented 3 months ago

Closing this for now, it seems it's an Erlang/OTP error.