boxyhq / jackson

🔥 Streamline your web application's authentication with Jackson, an SSO service supporting SAML and OpenID Connect protocols. Beyond enterprise-grade Single Sign-On, it also supports Directory Sync via the SCIM 2.0 protocol for automatic user and group provisioning/de-provisioning. 🤩
https://boxyhq.com/docs/jackson/overview
Apache License 2.0
1.84k stars 163 forks source link

MongoDB database engine connection count goes up and up #1109

Closed kottkrig closed 1 year ago

kottkrig commented 1 year ago

Issue Summary

We're using the MongoDb database engine. Over time Boxy seems to saturate all available connections to our MongoDb cluster. Boxy version 1.9.5.

Steps to Reproduce

  1. Use the "mongo" database engine
  2. Initiate a GET request to /api/health
  3. Observe the connection count of the database cluster to increase for every GET request

The connection count will increase during regular usage and never seems to drop until the service is restarted.

niwsa commented 1 year ago

Are you running Jackson as a service or embedding the npm module within your app?

kottkrig commented 1 year ago

We're running it as a service from an unaltered official Docker image.

niwsa commented 1 year ago

@kottkrig The connections are internally managed by the MongoDB node driver. I did a quick test (from a locally running instance of Jackson) with an atlas cluster. By sending 1000 health checks in quick succession, the number of connections spiked to around 18 and came down quickly (while requests were ongoing) to 5.

Screenshot 2023-04-29 at 3 02 36 PM
kottkrig commented 1 year ago

@niwsa thank you for the quick response. From your screenshots it seems to behave as expected.

It's peculiar that it differs so much from our environment. Here is the resulting graph when I manually refresh /api/health a number of times until the route stops responding. The connection count shoots up until the maximum connection count. As soon as I restart/stop the service, the connection count drops.

This is running in a test environment using the free tier of MongoDB Atlas which maxes out at 500 connections. And I'm seeing the same behaviour in our production environment.

Skärmavbild 2023-05-02 kl  07 54 32

At first I thought it might be related to our AWS environment but running the same docker image locally, while connected to MongoDB Atlas, exhibits the same behaviour. The connection count maxes out and then drops down when I stop the container.

Skärmavbild 2023-05-02 kl  09 38 13

kottkrig commented 1 year ago

I've had a stroll through the code to get some understanding on what might be going on. I've found a potential problem that might trigger an error like the one we are seeing. Even though I can't quite explain why you are not seeing the same issues on your end.

As you mention, the MongoDB connection pool is confined inside of the MongoClient instance. Which means that the MongoClient should be responsible for reusing and terminating used connections.

https://github.com/boxyhq/jackson/blob/ec3db48c87af2caf2ec45508d6dee7522a22eed6/npm/src/db/mongo.ts#L22-L24

However, since Jackson creates a new internal Mongo instance (and a new MongoClient instance) on each call to mongo.new, wouldn't that result in a new internal connection pool for each invocation of Mongo.new?

https://github.com/boxyhq/jackson/blob/ec3db48c87af2caf2ec45508d6dee7522a22eed6/npm/src/db/mongo.ts#L141-L145

Looking at it from the point of view of the /api/health handler, it calls the jackson() function in order to instantiate all the controllers and then destructure them.

https://github.com/boxyhq/jackson/blob/36deea674dfd217f7e691a8ee2567aaf2333b0d8/pages/api/health.ts#L12

Forgive me if I'm missing some nuance inside of @lib/jackson but I can't find if this instantiation is cached anywhere. And since jackson() is called on every request, and that function calls DB.new, wouldn't that trigger a new Mongo (and MongoClient) instance for each of these requests? Thus resulting in a new connection pool for each request? And eventually saturating all available connections.

https://github.com/boxyhq/jackson/blob/ec3db48c87af2caf2ec45508d6dee7522a22eed6/npm/src/index.ts#L75

niwsa commented 1 year ago

We do cache the controllers returned by Jackson here https://github.com/boxyhq/jackson/blob/main/lib/jackson.ts#L48. This means the connection happens only once.

Also, I tested with both 1.9.5 and 1.9.6 docker images. The results look good to me. 1 9 6 Do you happen to run any other client like MongoDB Compass simultaneously ?

kottkrig commented 1 year ago

Thank you for the pointer to the cached controllers. That clarifies the missing piece.

I really don't have any more clues as to what is wrong in our environment. I can replicate this issue locally 100% of the time with the following docker command and then refreshing /api/health. And we have the same issue running it in AWS container environment.

Could it be something wrong elsewhere in our environment stack that I'm overlooking?

The command I use to replicate this locally:

docker run \            
  -p 5225:5225 \
  -e DB_ENGINE="mongo" \
  -e DB_URL="mongodb+srv://<username>:<password>@testenv.xxxxxx.mongodb.net/jackson?retryWrites=true&w=majority" \
  -e JACKSON_API_KEYS="secret" \
  -e NEXTAUTH_URL="http://localhost:5225" \
  -e EXTERNAL_URL="http://localhost:5225" \
  -e NEXTAUTH_SECRET="super-secret" \
  -e NEXTAUTH_ADMIN_CREDENTIALS="admin@example.com:secretpassword" \
  -d boxyhq/jackson:1.9.6
niwsa commented 1 year ago

What OS/platform are you on? Additionally, can you share the docker engine version too?

Also happy to set up a call and see if we can get to the bottom of the issue ... You can ping me in the discord community https://discord.gg/WGumS7C2

kottkrig commented 1 year ago

Thank you for extraordinary help. I'll hop into Discord and see if we can schedule a call. Until then, here are the different environments and their applicable versions.

Local environment

Remote environment

MongoDB Atlas

niwsa commented 1 year ago

Is it possible for you to try with a new Mongo version ? The ones I tried are 4.4.10(locally) and 6.0.5(Atlas). Meanwhile, I'll try to spin up a 3.0.5 mongoDB locally, and see if I can reproduce the issue.

kottkrig commented 1 year ago

I'm sorry. There was a typo in the mongodb version. It was actually running 6.0.5 and not 3.0.5.

niwsa commented 1 year ago

Were you able to get around this issue? Let me know if you need further assistance ...

niwsa commented 1 year ago

@kottkrig Pushed a fix. Could you try with the latest docker image (v1.9.7)?

kottkrig commented 1 year ago

@niwsa it seems to work splendidly! Thank you for the quick turn-around!