elastic / elasticsearch-js

Official Elasticsearch client library for Node.js
https://ela.st/js-client
Apache License 2.0
22 stars 727 forks source link

Inconsistent "Server certificate CA fingerprint does not match the value configured in caFingerprint" (potential race condition?) #2355

Open the-gabe opened 2 months ago

the-gabe commented 2 months ago

๐Ÿ› Bug report

We have an application under development using Elasticsearch self hosted, with self signed certificates, with clients connecting using TLS and CA Fingerprints. However, we are running into what appears to be some kind of bug with the library, or potentially even Elasticsearch itself. The issue is not consistent from several hours of testing.

To reproduce

I have uploaded a repo here which is a stripped down poc of the issue based on our application code.

https://github.com/the-gabe/elastic-failure/tree/main

usage instructions:

curl -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.15.0-linux-x86_64.tar.gz

bsdtar xvf elasticsearch-8.15.0-linux-x86_64.tar.gz

cd elasticsearch-8.15.0

./bin/elasticsearch

note down the fingerprint and password when printed in terminal

git clone https://github.com/the-gabe/elastic-failure

cd elastic-failure

edit packages/indexer/vars.bash so that ELASTIC_QUEUE_PASSWORD , ELASTIC_VECTOR_PASSWORD , ELASTIC_QUEUE_FINGERPRINT and ELASTIC_VECTOR_FINGERPRINT reflect the password and CA fingerprint you noted down.

cd packages/indexer

npm ci --no-scripts

npm run build

bash vars.bash

observe output in terminal where both clients are able to obtain the elasticsearch version just fine. but then you get a caFingerprint failure after this. The output has been included in the root of the repo, in a file here https://github.com/the-gabe/elastic-failure/blob/main/logoutput.txt This file was created on the Arch Linux environment described below, with elasticsearch 8.15.0. On Azure App Service, we were using Elasticsearch 8.14.3-1 on RHEL 9.

I have found that this issue is reproducible around 30-40% of the time, but is a guess, and is not backed by testing. I have found that starting with a fresh elasticsearch-8.15.0 folder can help, but this may be coincidence. I suspect in a speculative fashion that it could be a race condition.

Expected behavior

this just should not happen

Node.js version

Node.js v22.6.0 on Arch Linux, v20.11.1 on Azure App Service, v20.16.0 on Debian 12

@elastic/elasticsearch version

8.15.0

Operating system

Arch Linux on WSL2, Debian 11 on Azure App Service, Debian 12

Any other relevant environment information

No response

the-gabe commented 2 months ago

Additionally, we hacked the library to try see what was going on in the fingerprint comparison. Logs have been attached here:

https://github.com/the-gabe/elastic-failure/blob/main/appservice-hackedlib.txt (Note: for clarity w.r.t line numbers, this log was run with our actual application, not the code in the git repo)

We modified node_modules/@elastic/transport/lib/connection/UndiciConnection.js

Here is a snippet of how it looked.

        if (this[symbols_1.kCaFingerprint] !== null) {
            const caFingerprint = this[symbols_1.kCaFingerprint];
            const connector = (0, undici_1.buildConnector)(((_a = this.tls) !== null && _a !== void 0 ? _a : {}));
            undiciOptions.connect = function (opts, cb) {
                connector(opts, (err, socket) => {
                    if (err != null) {
                        return cb(err, null);
                    }
                    if (caFingerprint !== null && isTlsSocket(opts, socket)) {
                        const issuerCertificate = (0, BaseConnection_1.getIssuerCertificate)(socket);
                        /* istanbul ignore next */
                        if (issuerCertificate == null) {
                            socket.destroy();
                            return cb(new Error('Invalid or malformed certificate'), null);
                        }
                        // Check if fingerprint matches
                        /* istanbul ignore else */
โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒconsole.log("this is what we provided to the lib   " + caFingerprint);
โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒconsole.log("This is what was pulled from socket   " + issuerCertificate.fingerprint256);
                        if (caFingerprint !== issuerCertificate.fingerprint256) {
                            socket.destroy();
                            return cb(new Error("Server certificate CA fingerprint does not match the value configured in caFingerprint"), null);
                        }
                    }
                    return cb(null, socket);
                });
            };
        }

And for the sake of 100% clarity, we triple checked that "4F:57:DA:6A:80:46:C5:9F:BD:9E:49:78:BA:26:A2:FC:39:1D:32:B7:63:6C:7D:96:82:6A:1E:C5:BE:24:26:48" was valid for our CA fingerprint, we know it is as we checked several times with openssl x509 -fingerprint -sha256 -in /etc/elasticsearch/certs/http_ca.crt | grep Fingerprint and we have other applications using this fine.

JoshMock commented 2 months ago

Just to rule it out: it wouldn't have anything to do with this change, would it?

the-gabe commented 1 month ago

Hi @JoshMock , I don't think so, the actual fingerprints taken from the socket are returning undefined.

the-gabe commented 1 month ago

@JoshMock We confirmed that this is not related and have tested with 8.7.0 of @elastic/transport instead of 8.7.1

JoshMock commented 1 month ago

Got it, didn't look at the logs close enough to see that it was undefined. Definitely not related. ๐Ÿ‘

alimoezzi commented 1 month ago

Hi, I'm also having a similar issue. I'm getting error: Unhandled Rejection at: Promise [object Promise] reason ConnectionError: Invalid or malformed certificate with a valid caFingerprint that works in python client but in js results in the error. I'm using 8.15.0 and node 22.

the-gabe commented 3 weeks ago

Hi @JoshMock have you managed to look into this? This is impacting our production environments with this application now, and it's not a situation we are comfortable with. This is quite literally a mission critical functional of the library (being able to connect to Elasticsearch in an encrypted and authenticated fashion securely), is there any progress being made regarding this bug in a private capacity?

JoshMock commented 2 weeks ago

No action has been taken yet, @the-gabe. I'm Elastic's only active maintainer of this project, and I've been either on PTO or occupied with higher priorities for the last few weeks. I will take a look as soon as I have time.

If you need a fix more urgently, pull requests are always welcome. I am typically able to review and merge a PR within a couple of working days if it has tests and all CI checks are passing.