firebase / firebase-functions

Firebase SDK for Cloud Functions
https://firebase.google.com/docs/functions/
MIT License
1.01k stars 202 forks source link

Various Firebase functions started reporting crash: Maximum call stack size exceeded #1539

Closed bhr closed 3 months ago

bhr commented 3 months ago

Various Firebase functions started reporting the crash: Maximum call stack size exceeded. The issue started occurring after we deployed Firebase cloud functions to the production environment on March 9 2024 at ~1:30 PM GMT. The last deployment before the issue started appearing was Mar 1 2024 at 12:00 GMT.

Affected Firebase functions are triggered by Cloud Firestore or PubSub, not https.

Related issues

1527 , but related to https functions

[REQUIRED] Version info

node: 20

firebase-functions: firebase-functions@npm:4.6.0 reproducing on firebase-functions@npm:4.8.0 too

firebase-tools: 13.4.1

firebase-admin: firebase-admin@npm:12.0.0

[REQUIRED] Test case

A simple cloud function that sets a value on the document that's changed. The issue appears on many cloud functions that perform Firestore update operations.


import { DocumentSnapshot, Timestamp } from 'firebase-admin/firestore';
import { Change, logger } from 'firebase-functions/v1';

export const processChangedDoc = (change: Change<DocumentSnapshot>) => {
  const afterSnapshot: DocumentSnapshot<AnyDocData> = change.after;
  const { updateTime } = afterSnapshot;
  if (!afterSnapshot?.exists || !updateTime) {
    // We have a delete event
    logger.debug('doc has no update time');
    return Promise.resolve();
  }

  const afterDocData = afterSnapshot.data();
  const beforeSnapshot: DocumentSnapshot<AnyDocData> = change.before;
  const beforeDocData = beforeSnapshot.data();

  const lastDateUpdated = afterDocData ? afterDocData.xDocDateUpdated : undefined;

  // We skip setting dateUpdated only if the last update was less than one hour ago
  const skipUpdate = !!lastDateUpdated && updateTime.seconds < lastDateUpdated.seconds + ONE_HOUR_SECONDS;

  if (skipUpdate) {
    logger.debug(`Skip update: lastUpdate: ${lastDateUpdated.toDate().toISOString()}, updateTime: ${updateTime.toDate().toISOString()}`);
    return Promise.resolve();
  }

  const docChange = {} as any;
  docChange[DocDateUpdatedKey] = updateTime;
  logger.debug(`doc update: before ${beforeDocData?.xDocDateUpdated?.toDate().toISOString()}, after ${updateTime.toDate().toISOString()},`);
  return afterSnapshot.ref.update(docChange).catch((err) => console.error(err));
};

export default region('europe-west1')
  .firestore.document('{collection}/{docId}')
  .onWrite((change) => {
    return processChangedDoc(change);
  });

[REQUIRED] Steps to reproduce

Deploy FCF. After a few hours, errors start appearing in Google Error Reporting. This only happens in the production environment where there's more activity (e.g. 600k invocations per 24hrs).

[REQUIRED] Expected behavior

FCF should not crash.

[REQUIRED] Actual behavior

Error: RangeError: Maximum call stack size exceeded
    at write (/workspace/node_modules/firebase-functions/lib/logger/index.js:66:74)
    at Object.debug (/workspace/node_modules/firebase-functions/lib/logger/index.js:76:5)
    at processChangedDoc (/workspace/lib/services/processChangedDocument.js:38:17)
    at /workspace/lib/api/db/all/levelOne.js:8:59
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
Error: RangeError: Maximum call stack size exceeded
    at get parent [as parent] (/workspace/node_modules/@google-cloud/firestore/build/src/reference.js:216:15)
    at processChangedDoc (/workspace/lib/services/processChangedDocument.js:17:73)
    at /workspace/lib/api/db/all/levelOne.js:8:59
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at /workspace/node_modules/firebase-functions/lib/common/onInit.js:33:16
    at entryFromArgs (/workspace/node_modules/firebase-functions/lib/logger/index.js:130:19)
    at Object.error (/workspace/node_modules/firebase-functions/lib/logger/index.js:116:11)
    at console.error (/workspace/lib/services/logging.js:15:28)
    at sendCrashResponse (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/logger.js:27:17)
    at sendResponse (/layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/invoker.js:37:40)
    at /layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/function_wrappers.js:34:40
    at bound (node:domain:432:15)
    at runBound (node:domain:443:12)
    at /layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/function_wrappers.js:142:60
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) 

Were you able to successfully deploy your functions?

Yes

No

google-oss-bot commented 3 months ago

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

MattSkala commented 3 months ago

We started experiencing the same errors today across our Firestore triggered functions, both with functions v4.7.0 and 4.8.0.

jttaynton-chattr commented 3 months ago

We started experiencing these issues intermittently as well on production after a deployment this past Friday afternoon. The changes that were deployed were fairly minor and we've had a tough time trying to isolate the cause. The errors are occurring in multiple trigger functions with no obvious reasoning. The errors don't seem to line up with actions that are actually occurring within the function thus the confusion in attempting to debug.

inlined commented 3 months ago

This seems related to a new feature released about a week ago. Downgrading the Firebase Functions SDK can give you a workaround. To help me diagnose:

  1. Are you actually using the feature (the new onInit callback)?
  2. Multiple users are talking about Firestore functions. Are they all Firestore 1st gen functions?
jttaynton-chattr commented 3 months ago

@inlined We are not using onInit callback. We're using firebase functions 4.6.0 right now. We thought we had a fix on our side but the errors are still occurring and the places they are being thrown from don't seem to make much sense. Our errors are occurring from event trigger v1 functions that had been running fine for 2+ years.

jttaynton-chattr commented 3 months ago

@inlined Is there a particular version of firebase functions that is safe from this error/issue?

I'm curious as to why we are seeing these errors yet we have version 4.6 in our package.json

bhr commented 3 months ago

This seems related to a new feature released about a week ago. Downgrading the Firebase Functions SDK can give you a workaround. To help me diagnose:

  1. Are you actually using the feature (the new onInit callback)?
  2. Multiple users are talking about Firestore functions. Are they all Firestore 1st gen functions?

@inlined thanks for looking into this!

  1. Not using onInit callback.
  2. All FCFs are 1st gen.
jttaynton-chattr commented 3 months ago

@bhr We installed the latest 4.8.1 just released and deployed one of our problem functions with that version. Short sample size but no new errors in about 35 minutes and the execution time of the trigger function has dropped back to where it was prior to issues and is staying there.

bhr commented 3 months ago

@jttaynton-chattr Thanks for the hint! Upgrading to 4.8.1 resolved the issue indeed. Execution time went down to level where it was supposed to be. No error's been thrown in the past 30 minutes

Screenshot 2024-03-19 at 9 56 57 PM
inlined commented 3 months ago

Thanks for your patience. We added a new onInit lifecycle callback and internally call withInit to wrap your handler with a decorator that ensures the init callback has been called. The 1st gen method for doing this accidentally called handler = withInit(handler) inside a lamda that is called as boilerplate; this lambda was capturing the wrapped handler and wrapping it again repeatedly. Moving the withInit call outside the lambda ensured that it only gets called once.

We're very sorry for this issue and how stressful it may have been. This error slipped through our testing since it only becomes a problem at notable scale. The good news is that unless you manually turned off retries for your functions, everything would have been successfully retried.