[fix] Unexpected high CPU use with 1s interval job

brenc commented 1 year ago

Describe the bug

Node.js version: v16.17.0

OS version: Docker node:16-bullseye-slim

Description: Just started using Bree to schedule some jobs inside a Docker container. Today I was working on a job that runs every 1s. I noticed that the job runner was using unusually high CPU for a very simple job. If I run my job in a while loop with a 1s sleep, it uses barely any CPU. If I run it using Bree, the job runner will consistently use 20-40% CPU.

Actual behavior

High CPU use.

Expected behavior

Much lower CPU use.

Code to reproduce

Here's the entire job runner:

const Bree = require('bree');
const Graceful = require('@ladjs/graceful');
const logger = require('./logger'); // this is just winston going to console

const bree = new Bree({
  jobs: [
    {
      name: 'aggregate-icecast-status',
      interval: '1s',
    },
  ],
  logger,
});

const graceful = new Graceful({ brees: [bree] });
graceful.listen();

(async () => {
  logger.error('starting...');
  await bree.start();
})();

Here's the job:

const { forEach } = require('modern-async');
const dns = require('dns').promises;
const fetch = require('node-fetch');
const Redis = require('ioredis');

const logger = require('../logger').child({
  extraInfo: 'aggregate-icecast-status',
});

const redis = new Redis({
  host: 'redis',
  keyPrefix: 'ngradio:status:',
});
const resolver = new dns.Resolver();

async function main() {
  logger.debug('aggregating icecast status');

  let addresses;
  try {
    addresses = await resolver.resolve4('icecast');
  } catch (err) {
    logger.warn(`error resolving Icecast hosts: ${err.message}`);
    return;
  }

  logger.debug('Icecast addresses: %o', addresses);

  let totalListeners = 0;
  let listenersPerHost = new Map();
  let listenerPeaksPerHost = new Map();
  let title;
  await forEach(addresses, async (address) => {
    logger.debug(`fetching status from ${address}`);

    let data;
    let response;
    try {
      response = await fetch(`http://${address}:8000/status-json.xsl`);
      data = await response.json();
      // logger.debug('%o', data);
    } catch (err) {
      logger.warn(`erroring collecting status from ${address}: ${err.message}`);
      return;
    }

    const listeners = parseInt(data?.mounts?.['/radio.mp3']?.listeners, 10);
    const listenerPeak = parseInt(
      data?.mounts?.['/radio.mp3']?.listener_peak,
      10
    );
    if (!title) {
      title = data?.mounts?.['/radio.mp3']?.title;
    }

    if (isNaN(listeners)) {
      logger.warn(`error parsing response from ${address}: listeners was NaN`);
    } else {
      totalListeners += listeners;
      listenersPerHost.set(address, listeners);
    }

    if (isNaN(listenerPeak)) {
      logger.warn(
        `error parsing response from ${address}: listener peak was NaN`
      );
    } else {
      listenerPeaksPerHost.set(address, listenerPeak);
    }
  });

  await redis.del('listenerPeaksPerHost');
  await redis.del('listenersPerHost');
  await redis.hset('listenerPeaksPerHost', listenerPeaksPerHost);
  await redis.hset('listenersPerHost', listenersPerHost);
  await redis.set('totalListeners', totalListeners, 'EX', 5);
  await redis.set('title', title, 'EX', 5);

  logger.debug(
    'Total listeners: %d, Listener peaks: %o, Listeners per host: %o, ' +
      'title: "%s"',
    totalListeners,
    listenerPeaksPerHost,
    listenersPerHost,
    title
  );

  await redis.quit();
}

(async () => {
  await main();
})();

Checklist

[X] I have searched through GitHub issues for similar issues.
[X] I have completely read through the README and documentation.
[X] I have tested my code with the latest version of Node.js and this package and confirmed it is still not working.

DnOberon commented 1 year ago

I'm seeing a similar issue running inside k8s, but it doesn't matter on the interval - I get an extremely high cpu usage spike every time it runs. I could just be doing something stupid in the jobs but I figured you'd be able to spot that quickly if so.

Job(s) initialized here: https://github.com/idaholab/Deep-Lynx/blob/master/src/main.ts#L33 Job(s) in question: https://github.com/idaholab/Deep-Lynx/blob/master/src/jobs/data_staging_emitter.ts , https://github.com/idaholab/Deep-Lynx/blob/master/src/jobs/edge_queue_emitter.ts

brenc commented 1 year ago

I've noticed this too even with 5s interval jobs. Bree spikes to > 100% CPU on each run (running the job manually doesn't do this).

shadowgate15 commented 1 year ago

Would one of you be willing to give us a basic repo that reproduces this with the Dockerfile in it? We haven't seen this on our servers and it doesn't seem to happen on my local setup even when running in Docker.

brenc commented 1 year ago

Sure, I'll try to whip something up this week.

brenc commented 1 year ago

https://github.com/brenc/bree-issue-201-demo

I set up a simple job that only connects to redis and prints to the console. This causes bree to consistently use 10-20% CPU. Let me know if you need anything else.

brenc commented 1 year ago

Just added a DNS query and web requests to mimic the original job. The job runner is now using 18-38+% CPU.

You can use the run.sh command to run the project (or just copy the compose command out of there).

bree-cpu-usage

shadowgate15 commented 1 year ago

Thanks for putting this together. Since the job is running so frequently, the cpu usage will consistently be relatively high since every second you will incur the resource usage of building a new worker. One thing to keep in mind, specifically with Docker, is that the cpu's are limited by docker itself, so depending on your settings that could be 18-38% of one core.

I would suggest for tasks like these to define a long running job and use something like p-queue to run the task on an interval.

shadowgate15 commented 1 year ago

There might be a better way for us to handle this in Bree using a worker pool, but I will have to think on it some more.

brenc commented 1 year ago

That seems reasonable. A loop with a sleep would probably suffice as well. Thanks!

I'll keep this open in case you want to do anything more with it.

titanism commented 1 year ago

Yes, this is a code structure problem. You should be using something like p-queue and definitely not spawning a new worker thread every 1s. You should spawn a long running worker that then runs setInterval or something else in serial once high intense CPU operations are complete. See https://github.com/sindresorhus/promise-fun for more packages like p-queue.

breejs / bree