Automattic / mongoose

MongoDB object modeling designed to work in an asynchronous environment.
https://mongoosejs.com
MIT License
26.92k stars 3.84k forks source link

Random connection errors in Google Cloud Functions #10969

Closed axelvaindal closed 2 years ago

axelvaindal commented 2 years ago

Do you want to request a feature or report a bug?

BUG

What is the current behavior?

We are using Mongoose inside our Google Cloud Functions and we have a lot of errors related to the connection that seems very unstable. We have a lot of traffic in our cloud functions, something like dozens of calls per second. Most of the time the functions run well, but a non-neglectable percentage of our traffic is broken due to a connection error I'm unable to troubleshoot. I've tried reproducing with a simple cloud function that connects to Mongoose and runs every minute, and I've had 3 times the MongooseServerSelection error in two days. We also have random massive spike of MongoNetworkError: Client network socket disconnected before secure TLS connection was established I cannot explain.

If the current behavior is a bug, please provide the steps to reproduce. This script is the reproduction I used to have the 3 random MongooseServerSelectionError

import type { Connection, Model } from "mongoose";

import mongoose from "mongoose";
import * as functions from "firebase-functions";

const { db } = functions.config();

let connection: Connection = null;
let model: Model<any> | null;

export default async function testMongoose() {
  if (!connection) {
    connection = await mongoose.createConnection(db.url);
    await connection
      .asPromise()
      .catch((error) => logger.error("A db error has occured.", { error })); // This here is never executed in the logs.

    model = connection.model(
      "test_data",
      new mongoose.Schema({ message: String })
    );
  }

  await model.create({ message: "This is a test data." });
}

What is the expected behavior?

We no longer have unknown network errors.

What are the versions of Node.js, Mongoose, and MongoDB you are using? Note that "latest" is not a version. Node 14.X MongoDB v4.2 Mongoose v6.0.12

IslandRhythms commented 2 years ago

What does functions.config() have as the connection string? Is it something you set on your send or?

axelvaindal commented 2 years ago

@IslandRhythms functions.config() contains the connection string in the form provided by MongoDB Atlas: mongodb+srv://username:password@cluster.zxxmr.gcp.mongodb.net/db?retryWrites=true&w=majority

Note that we were using the old mongodb:// syntax before and we had the exact same issue.

axelvaindal commented 2 years ago

@IslandRhythms We just had a similar issue now and we have a new error that was MongooseError: Operation video_exports.findOne() buffering timed out after 10000ms. The problem seems to be the following:

  1. Our cloud functions connect correctly to MongoDB most of the time
  2. At some point, one existing connection or one new connection starts to timeout (why would this happen?)
  3. The cloud function is under constant load, so new instances are created to handle this load that also seems to timeout
  4. The load makes us reach the connection limit on our cluster
  5. After some time, the system seems to recover and everything go back to step 1 (normal load, connections are working)

The charts go crazy when Mongoose starts timing out and returns the first or the second error. Do you know how we could debug that and find the root cause? Is there any prior case of Mongoose/Mongo driver connection to suddenly timeout for no apparent reason?

image image

IslandRhythms commented 2 years ago

Just to double check you have whitelisted your ip?

axelvaindal commented 2 years ago

Just to double check you have whitelisted your IP?

yes, otherwise the calls would fail all the time not just randomly.

vkarpov15 commented 2 years ago

The "Client network socket disconnected before secure TLS connection was established" is indicative of network errors, likely transient. See https://github.com/sendgrid/sendgrid-nodejs/issues/891, https://stackoverflow.com/questions/53593182/client-network-socket-disconnected-before-secure-tls-connection-was-established. Do you have a whitelist of allowed outbound IPs?

The "buffering timed out after 10000ms" error should only happen if you're executing operations before Mongoose completes its initial connection in Mongoose 6. Is testMongoose() indicative of how you're connecting to MongoDB in production?

One option if you're running out of connections: try reducing your maxPoolSize: connection = mongoose.createConnection(db.url, { maxPoolSize: 2 }); . That will prevent one function from opening too many connections at once. Just be careful, maxPoolSize: 2 means each function instance can only make progress on 2 operations at once, so if you're relying on a lot of parallel operations that may slow down your app.

kk2491 commented 2 years ago

@IslandRhythms Good day.
We have similar issues in GCP Cloud run. When the cloud run scales and tries to deploy new instances, some of the instances fails to make initial connection and leads to failure of the system.
Could you please let me know if you have found the root cause and the fix for this issue?

Thank you,

IslandRhythms commented 2 years ago

@kk2491 please make a new issue describing your problem

axelvaindal commented 2 years ago

@vkarpov15

The "buffering timed out after 10000ms" error should only happen if you're executing operations before Mongoose completes its initial connection in Mongoose 6.

We have been running out of connections indeed, so the new instances fail to connect and the MongooseTimeoutError is due to that, nothing unusual here. We have 0.0.0.0 whitelisted in the MongoDB Atlas and that's pretty much it, not more networking configurations, we just deployed the testMongoose functions and that's all. Is there something wrong with how we set up the connections? I'm kinda thinking the pooling is not used correctly by the functions and we have 1500-2000 connections open for 1M functions invocations a day, is this number unusual?

Btw, MongoDB Atlas does not support the default auth mechanism provided when we create a Mongoose connection and it can make the connection fails. We have updated our connection string to add authMechanism=SCRAM-SHA-1 as the SHA-256 is unsupported in Atlas.

vkarpov15 commented 2 years ago

@axelvaindal can you try reducing maxPoolSize to 5? That may help.

it's not really possible for us to say how many connections should be opened for a given number of function invocations without studying your code and access patterns.