Open Vahor opened 2 weeks ago
My server crashed today, and after some investigation, I have found that it was most likely caused while the worker was doing a PullHelperImageJob.
The screen above is the list of failed jobs, where it can be seen that PullHelperImageJob runs every hour, and it fails every time. The job at the bottom of the list was the offending one that I believe crashed my server, it seems to have taken 1414 seconds to run. All this time the server was unresponsive, I had to restart it from the cloud console to get it back up again.
Tried to dig deeper, but I can't seem to be able to get any logs or stack traces.
Yes, I am experiencing a lag issue at least once a day and it becomes totally "NOT RESPONSIBLE". I'm very happy with coolify, but it's a pity, because it's very annoying.
Here is a bit more info that I hope will help:
First, I added a few logging lines to the code to figure out what is happening:
Then, I tried to dispatch the job synchronously and see if there's anything in the logs:
This did not result in anything printed to the logs.
Next thing was to try calling PullHelperImageJob#handle() directly:
This took a little longer to run, and left the following messages in the log, indicating a successful run:
This seems to suggest that there is nothing wrong with the code itself, might be a configuration issue, I don't know.
I hope this info helps a bit, and sorry in case this is not related to the problem posted by OP.
Coolify is running run endless loop of cron/maintenance jobs, causing 100% CPU usage constantly.
Description
I observed many lag spike every hour and by investigating I found this:
The spike takes 30% of all cores. And even 100% on a small machine with single core. None of these servers are the main server, so it shouldn't be related to the auto update/check. And even if it was, the cron for these jobs is not configured to run every hour.
And from what I see it's starts after a pull of coolify-helper.
(took screens at 15:00)
So I've checked in
/horizon/jobs/completed
and I can see multiple jobs (that ran at ~15:00):Don't really know where to search next, in horizon I don't see any way to filter for a single server. If you know tell me I can share more informations
Minimal Reproduction (if possible, example repository)
It appears on all my servers at the same time, every hour. So a coolify instance on a server (maybe add a second server) and wait for the cpu spike 🏕️
Exception or Error
No response
Version
v4.0.0-beta.323
Cloud?