Closed maxpain closed 2 years ago
What is the job configuration?
@shadowgate15
const jobs = [
{ name: 'sync-tournament-streams', cron: '* * * * *' },
{ name: 'start-scheduled-matches', cron: '* * * * *' },
{ name: 'start-scheduled-rounds', cron: '* * * * *' },
{ name: 'start-scheduled-stages', cron: '* * * * *' },
{ name: 'start-scheduled-tournaments', cron: '* * * * *' },
{ name: 'start-scheduled-stage-check-ins', cron: '* * * * *' },
{ name: 'auto-accept-tournament-matches', cron: '* * * * *' },
{ name: 'cleanup', cron: '* * * * *' },
{ name: 'update-streams', cron: '*/5 * * * *' },
{ name: 'update-top-users', cron: '*/10 * * * *' },
{ name: 'update-friends-online', cron: '* * * * *' },
{ name: 'update-statistic-counters', cron: '* * * * *' },
{ name: 'statistic-minutely', cron: '* * * * *' },
{ name: 'statistic-hourly', cron: '0 * * * *' },
{ name: 'statistic-daily', cron: '0 0 * * *' },
{ name: 'statistic-monthly', cron: '0 0 1 * *' },
{ name: 'statistic-yearly', cron: '0 0 1 1 *' },
{ name: 'gamemoney-checkouts', cron: '* * * * *' },
{ name: 'pay-ladder-prizes', cron: '* * * * *' },
{ name: 'sync-workshop-maps', cron: '* * * * *' },
]
const bree = new Bree({
root: path.join(path.dirname(fileURLToPath(import.meta.url)), 'jobs'),
jobs,
logger: breeLogger,
closeWorkerAfterMs: 10 * 60 * 1000,
errorHandler(error) {
logger.error(error)
sentry.captureException(error)
},
})
bree.start()
I wonder if that is due to the number of workers that are being created at one time and causing a delay due to it loading all of those files.
Does it continue to have the delay after the first time the job runs? Also, do all of the jobs have that delay?
Does it continue to have the delay after the first time the job runs?
Yes, I have consistent delays (from 15 to 30 seconds) in every launch of every job.
Does your logger use console
? If it does, there is a known node issue with that. Try adding the time stamp in the logger string to determine that. It could potentially be that the awaits are actually taking that long to run and due to the above issue the logs are getting backed up and giving inaccurate time information.
Seems the problem is that I have a lot of jobs, they have a lot of imports and when starting they consume CPU a lot. The code of a job itself doesn't consume the CPU that much, but Node imports do.
Should I stop using Bree / worker_threads and run all my jobs in the same process?
hmm interesting. One solution could be to use longer running jobs. another solution might be to reduce the size of the imports so they load faster.
I already use ES Modules, IDK how to make my modules load faster. Anyway, we use npm packages, which can be CommonJS modules and destroy import performance.
Is there a way to cache imports in worker_threads?
Jobs would start slowly due to CPU and/or memory limitations. You may benefit from rewriting your jobs, or having a long-running job and then having another separate job that runs on a schedule (e.g. * * * * *
) which sends to the parent a message that it's time to start another job, and then parent would then listen for this message, and then send to the other long-running job that it's time to do XYZ again.
You may benefit from increasing the swap on your server if it is a memory issue. See https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-20-04 for more information.
If you are storing large files, e.g. huge JSON objects that you are use require
or import
with, you may benefit from having the parent worker send that payload to children through workerData
.
Without seeing the source code of your actual jobs, we're unable to help further.
You can also benefit from using console timers to debug what is causing such high CPU and memory intense operations. It seems from your notes that you think it is the Node imports/require calls though. See https://developer.mozilla.org/en-US/docs/Web/API/console/time.
Describe the bug
Node.js version: 18.4.0
OS version: Container-Optimized OS with containerd (cos_containerd)
Description: It took ~30 seconds to start a Job
Actual behavior
The thread/worker goes online fast, but actual code execution starts after ~30 seconds.
Expected behavior
Code execution to start within a second.
Code to reproduce
Checklist