Closed radicaled closed 2 months ago
So, setting the scheduling option to fifo seems to work but kind of kicks the can down the road. IE, if we had a background process that slept or only did work sporadically, it is possible to go > 5 minutes without making a database request, thus bringing us back to ArangoDB hanging up on us.
For now we're using the following hack to keep each connection alive while our background services are running:
// Prevent every keep-alive connection from going idle.
// Setting ConnectionOptions.agentOptions.strategy to `fifo` ensures this code will be invoked for every connection in the pool
// 32 sockets * 5s interval = 160s. ArangoDB's default keep-alive timeout is 300s.
const checkDatabaseTimer = setIntervalAsync(async () => {
await db.exists().catch(e => {
logger.warn(`ArangoDB connection check failed: ${e.message}`);
});
}, 5 * 1000);
The upcoming 9.0.0 release replaces http.request
/xhr
with native fetch
. This changes how network requests are issued, which may solve this issue. Can you please try the pre-release version by installing arangojs@next
and see if that fixes your problem.
Sorry for the late reply.
We rely on some settings native to the node.js HttpAgent that don't seem to have an analogue in the node.js fetch implementation (undici), so I can't really test this.
We'll probably be staying on a pre-9.x version of arangojs for as long as we use ArangoDB.
@radicaled There's a workaround to modify the agent
used by Node.js fetch:
https://github.com/arangodb/arangojs?tab=readme-ov-file#nodejs-with-self-signed-https-certificates
I'm closing this issue then. Feel free to reopen this if the problem occurs in v9.
We've had a long-running issue of seeing "socket hang up" errors when using arangojs. They didn't happen all the time; it was pretty sporadic. It was eventually tracked down to a combination of factors:
agentOptions.keepAlive
agentOptions.scheduling
beinglifo
--http-keep-alive-timeout
being set to 300 seconds (see https://www.arangodb.com/docs/stable/programs-arangod-options.html#--httpkeep-alive-timeout)agentOptions.maxSockets
set to 32 (we're using arangojs in the context of a GraphQL server)This combination meant that if we didn't use a socket for 300 seconds (a realistic possibility with a 32 sockets pool), one of the sockets would be disconnected and arangojs wouldn't know until it tried to make a request using it.
The quick fix was to disable HTTP keep-alive via
agentOptions.keepAlive = false
, however that increased response latency significantly (sometimes doubling it). The second thing we investigated was trying to use HTTP2, but arangojs basically expects thehttp
module'sAgent
, so that would have been too much surgery and would have had questionable forward compatibility.What we've done for now is is set
agentOptions.scheduling
tofifo
. According to the documentation (https://nodejs.org/api/http.html#http_new_agent_options), this defaults tolifo
which means that some sockets may not be used. Thus, during a period of idle activity, these connections can be dropped by ArangoDB. And then, during a period of higher activity, the agent will try to use one of these dropped sockets, then bang: Socket hang up! But, withfifo
, even during periods of low activity, most of our use-case has us making enough requests to ArangoDB so that a socket is never considered idle, so it never gets terminated by ArangoDB's keep-alive timeout.So, setting the scheduling option to
fifo
seems to work but kind of kicks the can down the road. IE, if we had a background process that slept or only did work sporadically, it is possible to go > 5 minutes without making a database request, thus bringing us back to ArangoDB hanging up on us.I'm not too familiar with arangojs or the internals of the
http
module'sAgent
class, but is there a better way of handling this type of error? Can we manually terminate idle sockets after a period of time (matching ArangoDB's--http-keep-alive-timeout
), or do a NO-OP request to the server using one of those idle sockets to make sure there's no disconnections?