Yenthe666 / InstallScript

Odoo install script
MIT License
1.23k stars 1.55k forks source link

Adding a way to automatically restart the service after odoo out of memory crash [Feature Request] #238

Closed 3mrdev closed 4 years ago

3mrdev commented 4 years ago

The problem: What would cause Odoo to crash and is there a way to automatically restart the service ?

Tested Solution (Crontab):

Assuming the user home directory is

/home/firebits

and odoo service path is

/etc/init.d/odoo-server

and the service grep is

/odoo/odoo-server/odoo-bin

Step one: Create script

odoo_autorestart.sh

#!/bin/bash
SERVICE=/odoo/odoo-server/odoo-bin
if ps ax | grep -v grep | grep $SERVICE > /dev/null
then
    #echo "1"
    echo "$SERVICE is running well at `date`" >> /home/firebits/restart.log
else
    #echo "2"
    echo "$SERVICE is not running. Warning !! Probably and Out of memory crash check your system logs if its correct then Upgrade your memory ram  Restarting service... at `date`" >> /var/log/odoo-server.log
    echo "$SERVICE is not running. Restarting... at `date`" >> /home/firebits/restart.log
    /etc/init.d/odoo-server start > /dev/null
fi

Step 2: Create restart log

touch restart.log

Step 3: Execution permissions

chmod 775 odoo_autorestart.sh
chmod 775 restart.log

Step 4: Doing a cron job to check every given time

crontab -e

and add the path to the crontab

*/5 * * * * /home/firebits/odoo_autorestart.sh

More solutions if you want:

How To Configure a Linux Service to Start Automatically After a Crash or Reboot – Part 1: Practical Examples

Yenthe666 commented 4 years ago

@3mrdev you're trying to find a solution for a symptom, not for the cause of the symptom. Your Odoo simply shouldn't stop its service/instance. This has to be caused by an issue in code or deployment. I've got Odoo's running for years without stopping themselves. Are you sure there are no errors in the log?

codeagencybe commented 4 years ago

Maybe he is referring to a feature from docker or other dev languages like nodejs has "forever" that keeps the node server running/restarting.

Docker has several policies to auto restart a container like eg on-failure, always, unless-stopped. https://docs.docker.com/config/containers/start-containers-automatically/

But I agree with Yenthe, in any case, the logs should always be checked to find the root cause. I'm some cases auto restarting won't help either. It just keeps looping between crashing>restarting>crashing infinite. This feature is only interesting to prevent long downtime due to eg a process that freezes. But it will never fix a code bug

3mrdev commented 4 years ago

I completely agree with you @Yenthe666 @codeagencybe but it could be helpful to keep the server up and running until upgrading the server resources like memory (ram) since not all people know the "Out of memory" error which crashes the service with no logs in odoo.log .. Maybe your servers have good resources that's why you have never ran into this see odoo/odoo#7031. Me my self had this problem recently and i have been using Odoo for 3 years now and never ran into this before. So adding it is more of a plus than a minus and it could be optional in the script its not important for all people until they face it and it could be a small tip in the read me file.

PS: I made an update in the bash script above to warn Odoo Administrator in Odoo log file.

Leave this issue open if someone ran into it they will leave a thumps up then we will know the need and the impact of this or just for acknowledgment. Then add the feature if it has more votes

Here is my logs from yesterday:

Command

sudo grep -i -r 'out of memory' /var/log/

Output

unknown@unknown:~# sudo grep -i -r 'out of memory' /var/log/
/var/log/kern.log:May 18 12:42:38 unknown kernel: [312789.639724] Out of memory: Kill process 17543 (odoo) score 456 or sacrifice child
/var/log/kern.log:May 19 04:10:01 unknown kernel: [368432.042257] Out of memory: Kill process 5998 (odoo) score 603 or sacrifice child
/var/log/kern.log:May 19 12:48:18 unknown kernel: [20107.539624] Out of memory: Kill process 2148 (odoo) score 657 or sacrifice child
/var/log/kern.log:May 19 12:49:57 unknown kernel: [20207.213508] Out of memory: Kill process 15574 (odoo) score 703 or sacrifice child
/var/log/kern.log:May 19 12:57:24 unknown kernel: [20653.423910] Out of memory: Kill process 15908 (odoo) score 739 or sacrifice child
/var/log/kern.log:May 19 13:01:45 unknown kernel: [20914.508911] Out of memory: Kill process 16301 (odoo) score 726 or sacrifice child
/var/log/kern.log:May 19 13:07:46 unknown kernel: [21275.266628] Out of memory: Kill process 16695 (odoo) score 738 or sacrifice child
/var/log/kern.log:May 19 13:14:49 unknown kernel: [21698.555191] Out of memory: Kill process 17003 (odoo) score 753 or sacrifice child
/var/log/kern.log:May 19 13:29:19 unknown kernel: [22568.931093] Out of memory: Kill process 17327 (odoo) score 738 or sacrifice child
/var/log/kern.log:May 19 13:53:16 unknown kernel: [24005.476687] Out of memory: Kill process 17719 (odoo) score 760 or sacrifice child
/var/log/kern.log:May 19 14:33:37 unknown kernel: [26426.260400] Out of memory: Kill process 18373 (odoo) score 740 or sacrifice child
/var/log/kern.log:May 19 18:18:55 unknown kernel: [39944.994867] Out of memory: Kill process 19532 (odoo) score 672 or sacrifice child
/var/log/kern.log:May 20 01:18:03 unknown kernel: [65092.062464] Out of memory: Kill process 27271 (odoo) score 703 or sacrifice child
Binary file 
/var/log/kern.log.1:May 11 01:10:51 unknown kernel: [4787511.555388] Out of memory: Kill process 17229 (odoo) score 221 or sacrifice child
/var/log/kern.log.1:May 12 11:21:58 unknown kernel: [4910578.180981] Out of memory: Kill process 9787 (odoo) score 247 or sacrifice child
/var/log/kern.log.1:May 13 00:45:04 unknown kernel: [4958762.996420] Out of memory: Kill process 24123 (odoo) score 254 or sacrifice child
/var/log/kern.log.1:May 13 04:04:14 unknown kernel: [4970713.824550] Out of memory: Kill process 6217 (odoo) score 160 or sacrifice child
/var/log/kern.log.1:May 13 11:57:04 unknown kernel: [4999083.314591] Out of memory: Kill process 13660 (odoo) score 497 or sacrifice child
/var/log/kern.log.1:May 13 12:26:46 unknown kernel: [5000865.584675] Out of memory: Kill process 1458 (odoo) score 502 or sacrifice child
/var/log/kern.log.1:May 13 12:48:13 unknown kernel: [5002152.618643] Out of memory: Kill process 2659 (odoo) score 510 or sacrifice child
/var/log/kern.log.1:May 13 18:35:18 unknown kernel: [5022976.923029] Out of memory: Kill process 3733 (odoo) score 233 or sacrifice child
/var/log/kern.log.1:May 13 20:05:05 unknown kernel: [5028363.723437] Out of memory: Kill process 26978 (mysqld) score 194 or sacrifice child
/var/log/kern.log.1:May 13 20:13:35 unknown kernel: [5028873.504609] Out of memory: Kill process 29366 (odoo) score 372 or sacrifice child
/var/log/kern.log.1:May 13 20:32:07 unknown kernel: [5029986.145013] Out of memory: Kill process 1087 (odoo) score 254 or sacrifice child
/var/log/kern.log.1:May 13 20:39:50 unknown kernel: [5030448.352013] Out of memory: Kill process 1394 (odoo) score 272 or sacrifice child
/var/log/kern.log.1:May 13 20:56:30 unknown kernel: [5031448.292987] Out of memory: Kill process 32711 (mysqld) score 187 or sacrifice child
/var/log/kern.log.1:May 13 20:56:33 unknown kernel: [5031451.904901] Out of memory: Kill process 2415 (odoo) score 129 or sacrifice child
/var/log/kern.log.1:May 13 21:03:55 unknown kernel: [5031894.174088] Out of memory: Kill process 3331 (odoo) score 133 or sacrifice child
/var/log/kern.log.1:May 13 21:52:43 unknown kernel: [5034821.005284] Out of memory: Kill process 7004 (odoo) score 140 or sacrifice child
/var/log/kern.log.1:May 13 22:03:32 unknown kernel: [5035470.386693] Out of memory: Kill process 7815 (odoo) score 443 or sacrifice child
/var/log/kern.log.1:May 13 22:19:10 unknown kernel: [5036408.911528] Out of memory: Kill process 8098 (odoo) score 430 or sacrifice child
/var/log/kern.log.1:May 13 23:13:15 unknown kernel: [5039653.501591] Out of memory: Kill process 9194 (odoo) score 313 or sacrifice child
/var/log/kern.log.1:May 14 02:17:48 unknown kernel: [5050726.299255] Out of memory: Kill process 22608 (odoo) score 212 or sacrifice child
/var/log/kern.log.1:May 14 02:21:05 unknown kernel: [5050920.247896] Out of memory: Kill process 20305 (postgres) score 124 or sacrifice child
/var/log/kern.log.1:May 14 02:21:09 unknown kernel: [5050927.672691] Out of memory: Kill process 23735 (odoo) score 118 or sacrifice child
/var/log/kern.log.1:May 14 02:22:47 unknown kernel: [5051025.953158] Out of memory: Kill process 25129 (odoo) score 157 or sacrifice child
/var/log/kern.log.1:May 14 02:37:45 unknown kernel: [5051923.909946] Out of memory: Kill process 25700 (odoo) score 462 or sacrifice child
/var/log/kern.log.1:May 14 03:57:33 unknown kernel: [5056711.779617] Out of memory: Kill process 26963 (odoo) score 309 or sacrifice child
/var/log/kern.log.1:May 14 07:29:17 unknown kernel: [5069415.299298] Out of memory: Kill process 30271 (odoo) score 513 or sacrifice child
/var/log/kern.log.1:May 14 19:04:00 unknown kernel: [5111097.491618] Out of memory: Kill process 18538 (odoo) score 496 or sacrifice child
/var/log/kern.log.1:May 14 20:16:31 unknown kernel: [5115448.521346] Out of memory: Kill process 21693 (odoo) score 389 or sacrifice child
/var/log/kern.log.1:May 14 20:33:28 unknown kernel: [5116466.141790] Out of memory: Kill process 22890 (odoo) score 461 or sacrifice child
/var/log/kern.log.1:May 14 20:42:51 unknown kernel: [5117029.027434] Out of memory: Kill process 24029 (odoo) score 242 or sacrifice child
/var/log/kern.log.1:May 14 20:50:14 unknown kernel: [5117472.317029] Out of memory: Kill process 25253 (odoo) score 529 or sacrifice child
/var/log/kern.log.1:May 15 20:25:18 unknown kernel: [81351.746946] Out of memory: Kill process 857 (odoo) score 506 or sacrifice child
/var/log/auth.log.1:May 14 04:26:00 unknown sudo:     root : TTY=pts/7 ; PWD=/root ; USER=root ; COMMAND=/bin/grep -i -r out of memory /var/log/
/var/log/auth.log.1:May 15 20:40:02 unknown sudo:     root : TTY=pts/0 ; PWD=/root ; USER=root ; COMMAND=/bin/grep -i -r out of memory /var/log/
/var/log/auth.log.1:May 15 23:11:53 unknown sudo:     root : TTY=pts/0 ; PWD=/root ; USER=root ; COMMAND=/bin/grep -i -r out of memory /var/log/
/var/log/syslog:May 19 12:48:18 unknown kernel: [20107.539624] Out of memory: Kill process 2148 (odoo) score 657 or sacrifice child
/var/log/syslog:May 19 12:49:57 unknown kernel: [20207.213508] Out of memory: Kill process 15574 (odoo) score 703 or sacrifice child
/var/log/syslog:May 19 12:57:24 unknown kernel: [20653.423910] Out of memory: Kill process 15908 (odoo) score 739 or sacrifice child
/var/log/syslog:May 19 13:01:45 unknown kernel: [20914.508911] Out of memory: Kill process 16301 (odoo) score 726 or sacrifice child
/var/log/syslog:May 19 13:07:46 unknown kernel: [21275.266628] Out of memory: Kill process 16695 (odoo) score 738 or sacrifice child
/var/log/syslog:May 19 13:14:49 unknown kernel: [21698.555191] Out of memory: Kill process 17003 (odoo) score 753 or sacrifice child
/var/log/syslog:May 19 13:29:19 unknown kernel: [22568.931093] Out of memory: Kill process 17327 (odoo) score 738 or sacrifice child
/var/log/syslog:May 19 13:53:16 unknown kernel: [24005.476687] Out of memory: Kill process 17719 (odoo) score 760 or sacrifice child
/var/log/syslog:May 19 14:33:37 unknown kernel: [26426.260400] Out of memory: Kill process 18373 (odoo) score 740 or sacrifice child
/var/log/syslog:May 19 18:18:55 unknown kernel: [39944.994867] Out of memory: Kill process 19532 (odoo) score 672 or sacrifice child
/var/log/syslog:May 20 01:18:03 unknown kernel: [65092.062464] Out of memory: Kill process 27271 (odoo) score 703 or sacrifice child
/var/log/syslog.1:May 18 12:42:38 unknown kernel: [312789.639724] Out of memory: Kill process 17543 (odoo) score 456 or sacrifice child
/var/log/syslog.1:May 19 04:10:01 unknown kernel: [368432.042257] Out of memory: Kill process 5998 (odoo) score 603 or sacrifice child
/var/log/odoo/odoo-server.log:psycopg2.OperationalError: FATAL:  out of memory
/var/log/postgresql/postgresql-10-main.log.1:2020-05-13 22:03:32.054 UTC [20308] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log.1:2020-05-14 07:29:17.447 UTC [4567] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log.1:2020-05-14 07:29:17.447 UTC [4572] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log.1:2020-05-14 07:29:17.464 UTC [4558] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log.1:    2020-05-14 20:49:47.803 UTC [25747] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log.1:2020-05-14 20:49:54.753 UTC [25743] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:00:30.886 UTC [16620] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:05:58.066 UTC [16955] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:06:02.721 UTC [16950] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:13:01.808 UTC [17274] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:13:15.538 UTC [17272] odoo@postgres FATAL:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:27:15.075 UTC [17669] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:27:15.075 UTC [17665] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:27:16.959 UTC [17661] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 13:51:32.759 UTC [18295] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 14:31:42.372 UTC [19477] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 14:31:43.550 UTC [19476] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 14:32:24.960 UTC [19485] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 14:32:24.960 UTC [19472] odoo@postgres ERROR:  out of memory at character 21
/var/log/postgresql/postgresql-10-main.log:2020-05-19 14:32:31.235 UTC [19481] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 18:16:58.877 UTC [27221] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-19 18:17:03.060 UTC [27214] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-20 01:15:45.063 UTC [8048] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-20 01:15:46.835 UTC [8038] odoo@postgres FATAL:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-20 01:15:46.835 UTC [8040] ERROR:  out of memory
/var/log/postgresql/postgresql-10-main.log:2020-05-20 01:16:59.585 UTC [8067] odoo@postgres FATAL:  out of memory
Yenthe666 commented 4 years ago

Sure I can leave it open for a while but Odoo should not be checked and restarted every x time if it went down. For this you use resources such as New Relic who monitor uptimes and issues and alert you of possible outages. I see no added value in this, sorry.

3mrdev commented 4 years ago

@Yenthe666 Thanks for your response. Then there is no need to leave this issue open if there is no added value to the script. Lets leave it as a tip. Thanks for the grate script!

Yenthe666 commented 4 years ago

You're welcome, thanks for sharing your thoughts :)

chris001 commented 4 years ago

What would be nice is, upon detection of crash of odoo-server service, the cron task could upload the crash logs to a bug server, for analysis of the root cause of the crash. Does odoo have a bug server?

codeagencybe commented 4 years ago

@chris001

Nope, odoo has nothing like that. But you can integrate something like sentry if you want. Its also open source and self hosted. https://sentry.io

Yenthe666 commented 4 years ago

Yeah there are plenty of tools to do this kind of stuff honestly.

chris001 commented 4 years ago

Shouldn't @odoo make a public sentry server, and use it to automatically collect the crash data when any instance of Odoo crashes, then analyze the root cause of each crash, for the purpose of improving user experience on Odoo.

Yenthe666 commented 4 years ago

I don't expect that to come any time soon honestly.

ferdymercury commented 1 year ago

With Ubuntu and systemctl, this script could also be used: https://tecadmin.net/monitor-systemd-service-using-cron-with-automatic-restart/

codeagencybe commented 1 year ago

With Ubuntu and systemctl, this script could also be used: https://tecadmin.net/monitor-systemd-service-using-cron-with-automatic-restart/

While I understand many people think it's a great idea to have Odoo restart automatically, in reality this is a bad practice in regards to backups. I have seen from several projects where auto-restarts cause more harm than good. One big problem is that it can fill up your server disk/storage like crazy.

Imagine you have a big instance, backups ~2GB+ as example. So the backup process starts, and crash somewhere around 1GB -> your script will auto restart and the backup process will start again, run until half way, crash again, over and over... Result: your drive will fill up with corrupt backup files until 100% full and then crash everything. While only the backup crashes, the rest of Odoo is still working fine. With your "solution" you are causing the entire system to go down due to lack of storage.

Another big problem is that your backups could result all corrupt. Because you completely rely on the fact that a service restart fix the problem, the only thing it might do is generating corrupt backups for days, weeks or months while you never know about it. Because your auto restart will take away the "seriousness" of the root cause, you may decide to ignore to diagnose. Until the day comes where you need to restore from a backup and then...oops all backups broken. Nothing to restore from...

For your own hobby projects and non-critical/production stuff, sure do whatever experiments you want or see fit. But these kind of solutions are absolutely not production reliable and should never be done like this. The only proper solution is monitoring, tracing your errors with proper tooling and if the error occurs, fix the root cause so it can't happen again instead of just restarting a service and ignoring the root cause.