firebase / firebase-tools

The Firebase Command Line Tools
MIT License
4.01k stars 930 forks source link

PubSub emulator consumes more and more CPU when background triggers are toggled #6585

Open dominics opened 10 months ago

dominics commented 10 months ago

Synopsis: When the PubSub emulator is running, and the "background triggers" (for Functions) are toggled off and on, the PubSub emulator starts to consume more and more CPU, until it is eventually consuming all available CPU.

[REQUIRED] Environment info

firebase-tools: 13.0.0 Also tested with 12.4.7

Platform: MacOS

[REQUIRED] Test case

Initial project setup:

firebase init functions # select Typescript
firebase init emulators # select PubSub, Functions, Firestore

Set up some functions, functions/src/index.ts looks like this:

import {onMessagePublished} from "firebase-functions/v2/pubsub";
import {onDocumentWritten} from "firebase-functions/v2/firestore";
import * as logger from "firebase-functions/logger";

export const helloPubsub = onMessagePublished("some-topic", () => {
    logger.info("Hello pubsub!");
});

export const helloFirestore = onDocumentWritten("/some/doc", () => {
    logger.info("Hello firestore!");
});

This is a minimal setup, but I first noticed this problem on a real, larger project. (And I think it's easier to cause to happen the more functions there are defined).

[REQUIRED] Steps to reproduce

Start the emulators:

cd functions && npm run build && cd ..
firebase emulators:start

We're looking for at least the PubSub and Functions emulators to be started. The setup described above will also start the Firestore emulator.

Now, we want to turn background triggers on and off repeatedly:

while true; do 
  curl -X PUT http://127.0.0.1:4400/functions/disableBackgroundTriggers; 
  sleep 1; 
  curl -X PUT http://127.0.0.1:4400/functions/enableBackgroundTriggers; 
  sleep 1;
done

Modify if you're using a different Emulator Hub address or port. Let this run for a while. The functions will be reinitializing with each run:

i  emulators: Disabling Cloud Functions triggers, non-HTTP functions will not execute.
i  functions[helloPubsub]: function temporarily disabled.
i  functions[helloFirestore]: function temporarily disabled.
i  emulators: Enabling Cloud Functions triggers, non-HTTP functions will execute.
✔  functions: Using node@18 from host.
Serving at port 8XXX

✔  functions: Loaded functions definitions from source: helloPubsub, helloFirestore.
✔  functions[us-central1-helloPubsub]: pubsub function initialized.
✔  functions[us-central1-helloFirestore]: firestore function initialized.

With only these two functions defined, you may need to let this script run for a few minutes (at least a hundred or two hundred toggles) before the actual behavior described below becomes very visible. But with many different PubSub functions defined (e.g. in a real project) the behavior is much more visible, because it happens with far fewer toggles.

[REQUIRED] Expected behavior

Monitor the CPU usage of the PubSub emulator. We'd expect it to be using some small amount of CPU to reinitialize (because each reinitialization of the functions might be creating a topic?)

[REQUIRED] Actual behavior

Monitoring the CPU usage of the PubSub emulator, what we actually see is it using more and more CPU, even if we pause or stop the script that is toggling the background triggers.

If you continue to toggle the background triggers, eventually the PubSub emulator will consume all available CPU on the machine. This can happen quite rapidly with many PubSub triggered functions defined.

This additional CPU usage never goes away e.g. if you completely stop the script that is toggling the background triggers, the CPU usage of the PubSub emulator does not ever go back down to idle levels of CPU usage. (e.g. it will continue to use 200% or 400%)

Notes

dominics commented 10 months ago

I'd strace it for you, to see what it's doing with all that CPU usage, but I'm on Mac OS right now (SIP makes that sort of thing pretty hard these days). I'll try to repro on a Linux machine soon if you don't get to it first.

All the debug logs are silent about where the usage is going (particularly pubsub-debug.log offers no clues)

RoqueCampos commented 3 months ago

Are there any news on this issue? We are experiencing exactly the same.

In our case this is happening while running test suites. While running a test suite, many topics and subscriptions are created. When the test suite is finished, the topics and subscriptions are removed (after purging them), and the next test suite is started with a similar behaviour. We have many thousand of tests and many hundred of test suites running in parallel (in batches of parallel test suites 5-6 of course).

Then after letting the whole run for a while, the cpu leak happens We stop the run of the tests, and the only thing we can observe in the emulator logs are messages like the following:


 com.google.cloud.pubsub.testing.v1.PushLoop push
INFO: Failed to push to endpoint 'http://localhost:62099/projects/aaaaaaa/topics/bbbbbb/function/cccccccc: Connection refused

for each of the subscriptions that were created inside the different test scenarios, but now don't exist anymore because they were destroyed by the teardown of the test suite. It seems that although we purged the subscription before deleting it, there were still some events flying in the emulator, and now they can not be delivered.

After letting many hundred of test suites run, the number of events that can not be delivered because the corresponding test suites are already done and the subscriptions where removed is so high, that the log of the emulator is bloated with this kind of errors, and we wonder if this could be related with the cpu leak.

Regards.