elastic / elasticsearch-js

Official Elasticsearch client library for Node.js
https://ela.st/js-client
Apache License 2.0
5.24k stars 727 forks source link

Stack traces are not informative when using `HttpConnection` #2235

Open JoshMock opened 4 months ago

JoshMock commented 4 months ago

🐛 Bug Report

When the client is configured to use HttpConnection instead of the default UndiciConnection class, error stack traces are often too brief:

ConnectionError: socket hang up - Local: 127.0.0.1:61564, Remote: 127.0.0.1:9200
    at KibanaTransport.request (/Users/afharo/Developer/elastic/kibana/node_modules/@elastic/transport/src/Transport.ts:585:17)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)

Likely due to http swallowing the stack during async operations.

See https://github.com/elastic/kibana/issues/60237

Expected behavior

A full stack trace back to the original API call that triggered the error.

rayafratkina commented 3 months ago

@JoshMock any update on this?

JoshMock commented 3 months ago

@rayafratkina I haven't had a chance to work on it yet. Hopefully in time for 8.15.

rayafratkina commented 3 months ago

Thanks Josh. We had a production incident a while back where lack of these traces impeded our ability to diagnose the issue, so addressing this as soon as possible is important for the serverless operations

JoshMock commented 3 months ago

Absolutely, I totally understand. I've assigned it to myself so it doesn't get put off much longer.

JoshMock commented 2 months ago

Note to self: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Error/cause

rayafratkina commented 1 month ago

@JoshMock did this change make it in time for 8.15?

JoshMock commented 1 month ago

I haven't started on this yet, but we don't feature freeze the clients and it's next up on my list of priorities. So there's a decent chance it will make it in by 8.15.

JoshMock commented 1 month ago

@rayafratkina Picking this up now, hoping to have a fix in the next few working days.

JoshMock commented 1 month ago

@pgayvallet @rudolf can either of you provide a code snippet to reproduce the stack trace output in the description? I've been trying to create an isolated example that does that, but so far the stack trace always comes out with more context.

JoshMock commented 1 month ago

@afharo You might also be able to help with :point_up:

JoshMock commented 1 month ago

Just to show an example: created a fake server that destroys connections immediately:

const http = require('http')

const host = 'localhost'
const port = 9200

const server = http.createServer((req, res) => {
  req.socket.destroy()
})
server.listen(port, host, () => {
  console.log(`Server is running: http://${host}:${port}`)
})

Then I connect to it from my client:

const client = new Client({
  node: 'http://localhost:9200',
  auth: {
    username: 'elastic',
    password: 'changeme',
  },
  Connection: HttpConnection,
  maxRetries: 0
})
await client.info()

And I get the following stack trace, which goes all the way back to the source:

/home/joshmock/Code/elastic-transport-js/lib/Transport.js:604
                            : new errors_1.ConnectionError(error.message, result, errorOptions);
                              ^

ConnectionError: socket hang up - Local: ::1:60642, Remote: ::1:9200
    at SniffingTransport._request (/home/joshmock/Code/elastic-transport-js/lib/Transport.js:604:31)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /home/joshmock/Code/elastic-transport-js/lib/Transport.js:636:32
    at async SniffingTransport.request (/home/joshmock/Code/elastic-transport-js/lib/Transport.js:632:20)
    at async Client.InfoApi [as info] (/home/joshmock/Code/elasticsearch-js/lib/api/api/info.js:40:12)
    at async main (/home/joshmock/Code/js-test/issue-2235/index.js:32:3) {
  options: { redaction: { type: 'replace', additionalKeys: [] } },
  meta: {
    body: undefined,
....
afharo commented 1 month ago

can either of you provide a code snippet to reproduce the stack trace output in the description?

Q: Maybe it's caused by RxJS?

I tried the following code:

const Rxjs = require('rxjs');
const {Client, HttpConnection} = require('@elastic/elasticsearch');

async function main() {
  const client = new Client({
    node: 'http://localhost:9200',
    auth: {
      username: 'elastic',
      password: 'changeme',
    },
    Connection: HttpConnection,
    maxRetries: 0
  })
  await client.info()
  // await client.info({local: true}) // index.js:15 shows in the stack trace

  await Rxjs.firstValueFrom(Rxjs.defer(() => client.info()))
  await Rxjs.firstValueFrom(Rxjs.defer(() => client.info({local: true}))) // it doesn't show the origin of the call
}

main()

And I can confirm that the direct call with the invalid local parameter shows the line in the stack trace, while the call inside the RxJS observable doesn't give any hint. 😢

@rudolf, @pgayvallet. Do you think this could be the reason instead of ES?

rudolf commented 1 month ago

I'm not aware of us wrapping the client or any client methods in rxjs for all of Kibana. So even though this could swallow some call stacks it doesn't explain why we see none.

Some parts that's unique about Kibana's client that you wouldn't get if you used the default client config:

  1. We use Connection: HttpConnection and ConnectionPool: ClusterConnectionPool https://github.com/elastic/kibana/blob/main/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/configure_client.ts#L46-L49
  2. We set a custom transport which uses try {await ...} catch () https://github.com/elastic/kibana/blob/main/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/create_transport.ts#L70-L94
rudolf commented 1 month ago

I tried to disable all the customizations we do with:

diff --git a/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/client_config.ts b/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/client_config.ts
index 7310cdc2da7..041c8d57e3f 100644
--- a/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/client_config.ts
+++ b/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/client_config.ts
@@ -38,12 +38,6 @@ export function parseClientOptions(
     // do not make assumption on user-supplied data content
     // fixes https://github.com/elastic/kibana/issues/101944
     disablePrototypePoisoningProtection: true,
-    agent: {
-      maxTotalSockets: config.maxSockets,
-      keepAlive: config.keepAlive ?? true,
-      timeout: getDurationAsMs(config.idleSocketTimeout),
-      maxFreeSockets: config.maxIdleSockets,
-    },
     compression: config.compression,
   };

diff --git a/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/configure_client.ts b/packages/core/elasticsearch/core-elasti
csearch-client-server-internal/src/configure_client.ts
index 287f3a670ce..7bd3ab7f3a1 100644
--- a/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/configure_client.ts
+++ b/packages/core/elasticsearch/core-elasticsearch-client-server-internal/src/configure_client.ts
@@ -42,11 +42,15 @@ export const configureClient = (
   const KibanaTransport = createTransport({ getExecutionContext });
   const client = new Client({
     ...clientOptions,
-    agent: agentFactoryProvider.getAgentFactory(clientOptions.agent),
-    Transport: KibanaTransport,
-    Connection: HttpConnection,
+    // agent: agentFactoryProvider.getAgentFactory(clientOptions.agent),
+    // Transport: KibanaTransport,
+    // Connection: HttpConnection,
     // using ClusterConnectionPool until https://github.com/elastic/elasticsearch-js/issues/1714 is addressed
-    ConnectionPool: ClusterConnectionPool,
+    // ConnectionPool: ClusterConnectionPool,
+  });
+
+  client.indices.get({ index: '.kibana', allow_no_indices: 'blah' }).catch((error) => {
+    console.log('a stack', error.stack);
   });

But still get:

a stack ResponseError: illegal_argument_exception
        Caused by:
                illegal_argument_exception: Failed to parse value [blah] as only [true] or [false] are allowed.
        Root causes:
                illegal_argument_exception: Could not convert [allow_no_indices] to boolean
    at SniffingTransport.request (/Users/rudolf/dev/kibana/node_modules/@elastic/transport/src/Transport.ts:564:17)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
JoshMock commented 1 month ago

Thanks @rudolf. Will build some test cases with those examples and see if I can figure this out.

JoshMock commented 1 month ago

I'm still having trouble consistently reproducing this in a way that makes it clear what needs to be fixed.

My main suspicion is that, because HttpConnection's request implementation still returns a Promise explicitly rather than using async/await, uncaught async exceptions may not capture a full stack trace the same way.

I'm going to spend some time trying to convert the function to async/await. If it goes smoothly, I can push a minor release of the transport and you all can test and verify whether or not it helps.

JoshMock commented 1 month ago

I keep running into dead ends with this. There isn't really a good way to rewrite that function to better leverage async/await, especially since http.request is so event-based and our solution needs to listen to several of those events.

So, I'm at a point where, unless you can give me an isolated code example that reproduces the problem, I'm not going to be able to confidently solve this right now. I tried reproducing examples similar to the code you linked to in Kibana, but none of them worked. It makes me wonder if the issue lies somewhere in Kibana's code and how it uses our client, rather than in the client. Handling uncaught exceptions is tricky when your code requires mixing async/await, Promises, and event listeners.

I'll keep this open, and come back to it when time allows, but some extra help from you all will definitely speed up the process.