Open galacoder opened 4 months ago
Hello, I don't have many details to share right now we installed coolify on some small hetzner servers yesterday - it was working like a breeze yesterday but today every deployment takes over all resources for several minutes with CPU usage just above 200% and very high IO ops on the harddrive.
I'm currently trying to figure out what might be causing this - if a change in my docker container is responsible - but currently waiting for the server to be available again.
Last 24 hours
Last hour
The hetzner console tells me it's out of memory - the server is one of the smallest available with just 4 GB - but the same process worked fine yesterday
UPDATE: restarting the hetzner server fixed the issue for me (for now) - i hope it doesnt happen again
Second this. As of today, I have the same issue. Fresh new Coolify installation, fresh new Contabo server (4vCPU, 6GB RAM). Takes up 100% of CPU. A small nodejs backend, which takes 5s to build on my machine, has been building for 10 minutes and counting.
root@vmi1916516:~# mpstat -u | awk '/all/ {printf "%.2f%%\n", 100-$12}'
100.00%
Don't know if it's normal, but php 8.2 and php-fpm often take up to 70% of my CPU when navigating Coolify. And not just for a split second, but steadily.
The usage jumps from one thing to the other, all while building a nodejs server with 200 lines of code...
I was planning to build my backend on Coolify. Do I just ditch it now or what?
similar issue coolify taking up large amounts of cpu and fluctuates on command /init
same here.
BTW: would be awesome if we could get an stat overview about the running containers directly on coolify to see theirs CPU and Memory usage.
Seeing a similar issue on v4.0.0-beta.297.
Same issue. Worked fine for a few days and now started to hang unresponsive on NextJS deployment.
Coolify: v4.0.0-beta.297 hetzner: CPX11 | x86 | 40 GB | us-west
Sometimes it helps to restart coolify or the host server. Then its ok again for about 1-2h
Check if you have enough ram. In my case swapd (which is responsible for using swap memory) would take all the CPU and adding more ram fixed it
I do have enough free ram. Its also not the swapd process that eats up all the cpu. Thanks for the proposal
I am having the same issue. I have three servers running, a Coolify server, a build server and a server just to run the containers (2 nextjs apps). It's always the server running the containers that goes down.
This is the chart from the most recent crash, it's particularly weird because no deployments were happening at the time and I can't see any traffic spikes either, just seemed to be random.
@andrasbacsai is it safe to rollback to from 297 to 4.0.0 296 for example? Cause the high CPU is making my prod environment nearly un usable... Yes, i know, i learned my lesson the hard way. never enable auto updates on prod systems...
same issue!!
My coolify docker container shows also as „unhealthy“
Same here v4.0.0-beta.297
last night i had a failed nextjs deployment but the high cpu only started like 10 hours later
Edit: Tried to bash into the coolify container: i can bash but once i'm in, any command hangs forever. even pwd. same happens with the coolify-db container
I restarted the coolify container.. let's see if the problem appears again.
Deployment of a service or also redeployment takes around 10-15 minutes. The same service was redeployed within 15-50 seconds before…
I reinstall it with Ubuntu 20.04 now it works fine.. Ubuntu 22.04 and 24.04 not working for me! Another Server I use Coolify with Debian 11 and it's better!
Interesting. Have you tried 22.04 and saw high CPU and then 24.04 as well?
or in other words… is it reproducable?
yes i try both versions 22.04 and 24.04 both the same 100% CPU High Usage issue! only on Ubuntu 20.04 and Debian 11 is good!
I have Ubuntu 20.04 and still have this issue. Every week coolify will fail and I have to restart the server. It's just coolify.
What happens if we limit the cpu usage of the coolify container?
Hey Guys.... v. 298 is out now :) https://github.com/coollabsio/coolify/releases/tag/v4.0.0-beta.298 At least on my side, it seems to not really change the CPU behaviour dramatically... How about you?
For all of you with 100% CPU Issue is anybody use supabase? because I install everything again only not supabase and have no issues! when i install supabase it happens again with the high 100% Usage..
I have had nothing installed or running at one point, other than coolify and still had spikes
whats vps provider you use?
For all of you with 100% CPU Issue is anybody use supabase? because I install everything again only not supabase and have no issues! when i install supabase it happens again with the high 100% Usage..
i don't have supabase, and i saw it once
Not using supabase hosting on hetzner @atilladeniz
very strange! it's spooky this problem! can not sleep a few days well beause of heart attack every second the server can goes to 100% and slows my connections and latency on the vps
cpulimit not works for me ! it goes up always !
@marke-dev can you send the logs of the docker container coolify i want look through it.. maybe we find the bug fix together!
@atilladeniz can you check if the cpu spike coincided with a database backup?
whats vps provider you use?
I'm not on a VPS but a dedicated server with IONOS
@marke-dev can you send the logs of the docker container coolify i want look through it.. maybe we find the bug fix together!
Yes definitely, I'll send that later today
Hello, I am experiencing a similar issue on my side too, using v4.0.0-beta.306
.
Doesn't really seem to be related to any failed deployment from my side, however... I would love to help hand out any information about this issue, feel free to request anything from me.
I have deployed beta.260 on the same server configuration but a new vps.
260 works perfectly fine. I think i will stay with 260 for the moment with disabled autoupdate.
Whats your plan?
Same issue. Worked fine for a few days and now started to hang unresponsive on NextJS deployment.
Coolify: v4.0.0-beta.297 hetzner: CPX11 | x86 | 40 GB | us-west
This is happening because next build
uses all cpus, am trying to figure out how to restrict the same
Same issue. Worked fine for a few days and now started to hang unresponsive on NextJS deployment. Coolify: v4.0.0-beta.297 hetzner: CPX11 | x86 | 40 GB | us-west
This is happening because
next build
uses all cpus, am trying to figure out how to restrict the same
You can set resource limits on an app here:
Also try this to limit the CPUs of just next build: https://github.com/vercel/next.js/discussions/65983
Just had another outage roughly a week after switching from EC2 to Digital Ocean, once again it was the server running the apps that crashed and not the Coolify or build servers. No build was running either it was just spontaneous. I did have to reboot both the Coolify server and the Apps server though, rebooting just apps didn't work.
One question... why do we all run production or at least "should be always up" stuff on a server with automatic updates on? 😅
One question... why do we all run production or at least "should be always up" stuff on a server with automatic updates on? 😅
I don't have automatic updates enabled. The issue happened on a version that was working fine for some time.
One question... why do we all run production or at least "should be always up" stuff on a server with automatic updates on? 😅
I don't have automatic updates enabled. The issue happened on a version that was working fine for some time.
Ok i see. I started with coolify 230 or so and after it updated to 297 the problems started i guess... this is why i implied that we all were updating
Same issue. Worked fine for a few days and now started to hang unresponsive on NextJS deployment. Coolify: v4.0.0-beta.297 hetzner: CPX11 | x86 | 40 GB | us-west
This is happening because
next build
uses all cpus, am trying to figure out how to restrict the sameYou can set resource limits on an app here:
Also try this to limit the CPUs of just next build: vercel/next.js#65983
Adding swap space to my little 2gb hetzner server fixed this for me
From this thread https://github.com/coollabsio/coolify/issues/2088#issuecomment-2082327051
2gb hetzner s
@ghsteff let me give it a try
Had the same problem on my 8 GB RAM Hetzner server. Adding 8 GB swap space solved it for me.
Edit: Didn't solve it, the spikes are just less common
The CPU spikes kept taking my app down for several minutes every few days.
Today, there was a downtime of several hours. This time I got this log alert: PullTemplatesAndVersions failed with: cURL error 28: Operation timed out after 30001 milliseconds with 0 out of 372726 bytes received (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://raw.githubusercontent.com/coollabsio/coolify/main/templates/service-templates.json
. I don't think the issue lies with Hetzner since restarting the server fixed it.
Hope this will be fixed soon. Coolify is amazing to use, but uptime is critical for me. I went back to my previous, manual docker setup that has proved to be reliable. I will follow the progress of Coolify and give it another try once it's out of beta.
For a week my app was working fine, and last couple of days i felt degradation in responsiveness. I checked htop and docker stats, and saw tremendous spikes in coolify docker CPU usage, though i don't have many apps: one bun app with pm2, umami stats and uptime-kuma.
Also have a subserver controlled by coolify where it deploys couple of dockerized apps. Nothing overwhelming from my perspective.
Hi @Aft1n yes, i experienced the same... i re installed/downgraded to coolify 260 and now its working perfectly fine again
I have shot down the server, and turned it back on after a minute, fixed it for now. But the question is for how long this effect will last
I have revisited this:
Taking a deeper look into the coolify container. I noticed possible php-fpm seems to be cause this issue or at least part of the issue all my containers are stopped through coolify, literally only coolify is running
hh:mm:ss 00:35:03 php-fpm: pool www 00:09:37 php /var/www/html/artisan start:horizon 00:11:29 php /var/www/html/artisan start:scheduler 00:35:08 php-fpm: pool www 00:12:19 /usr/bin/php8.2 artisan horizon:supervisor ff5b1471ba88-xDox:s6 redis
P.S. yes that is 35 mins
Mon 22 Jul 2024 11:24:02 PM EDT: Container coolify is using 107% CPU Mon 22 Jul 2024 11:25:03 PM EDT: Container coolify is using 146% CPU Mon 22 Jul 2024 11:25:03 PM EDT: Container coolify is using 146% CPU Mon 22 Jul 2024 11:26:04 PM EDT: Container coolify is using 109% CPU Mon 22 Jul 2024 11:26:04 PM EDT: Container coolify is using 138% CPU Mon 22 Jul 2024 11:32:03 PM EDT: Container coolify is using 147% CPU Mon 22 Jul 2024 11:32:03 PM EDT: Container coolify is using 147% CPU Mon 22 Jul 2024 11:33:04 PM EDT: Container coolify is using 154% CPU Mon 22 Jul 2024 11:33:04 PM EDT: Container coolify is using 154% CPU
sorry for the super late answer.
The ssh connections (especially the background jobs) causing the high cpu usage. ssh (even with mux) needs tons of cpu.
I started to optimize the jobs to get all data in one ssh connection.
Experiencing the same issue with beta.319
on a DigitalOcean VPS hosting. Do I understand it right that the solution for now is to rollback to beta.260
and wait for an official resolution?
Considering @andrasbacsai started working on it, I'd just wait patiently until the problem is resolved, to not lose all new features that was introduced (there's a few). It all depends to your need, ultimately, you're the one in charge for your server. Your call.
Description
I have been running Coolify for 8 days with various services, encountering no prior issues. However, on the night of April 30th EST, I experienced a significant CPU usage spike starting around 11 PM, shortly after an unsuccessful attempt to deploy a React application. It is unclear whether this issue was directly related to the deployment failure, a potential attack, or another problem.
Expected Behavior
CPU usage should remain stable, without significant spikes, particularly when no active deployments or heavy tasks are underway.
Actual Behavior
CPU usage unexpectedly spiked to over 300% and remained high throughout the night, which was unusual and concerning, given the context.
Environment
Additional Context
The sudden surge in CPU usage occurred post the deployment failure, but it is uncertain if the spike was a direct result of this event, a security issue, or another underlying problem. This incident warrants further investigation to prevent future occurrences.
I already tried to restart my VPS 2 times, but the problem still insisted.
I would appreciate any insights or troubleshooting steps you could recommend to help identify and resolve the root cause of this spike. Thank you for your assistance.
Minimal Reproduction (if possible, example repository)
Steps to Reproduce
Exception or Error
Screenshots
Version
4.0.0-beta.271