DocCyblade / tkl-odoo

Turnkey Linux - Odoo v8 (Published v14.2)
https://www.turnkeylinux.org/odoo
GNU General Public License v3.0
21 stars 24 forks source link

bug - Errors when doing restart or quick stop/start #43

Closed DocCyblade closed 8 years ago

DocCyblade commented 8 years ago

I think this is due to change of database password, and super admin password.

May need to use restart in inithook scripts if not in use already and put a 5-10 sec sleep in the init.d script

root@tkl-odoo ~# tail -n 100 /var/log/odoo/openerp-server-startup.log

Thu Oct  8 02:12:41 UTC 2015 - Starting openerp-server: 
Thu Oct  8 02:12:47 UTC 2015 - Stopping openerp-server: 
Thu Oct  8 02:12:47 UTC 2015 - Starting openerp-server: 
Traceback (most recent call last):
  File "/opt/openerp/odoo/openerp-server", line 5, in <module>
    openerp.cli.main()
  File "/opt/openerp/odoo/openerp/cli/__init__.py", line 68, in main
    o.run(args)
  File "/opt/openerp/odoo/openerp/cli/server.py", line 180, in run
    main(args)
  File "/opt/openerp/odoo/openerp/cli/server.py", line 174, in main
    rc = openerp.service.server.start(preload=preload, stop=stop)
  File "/opt/openerp/odoo/openerp/service/server.py", line 962, in start
    rc = server.run(preload, stop)
  File "/opt/openerp/odoo/openerp/service/server.py", line 637, in run
    self.start()
  File "/opt/openerp/odoo/openerp/service/server.py", line 609, in start
    self.socket.bind(self.address)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 98] Address already in use
Thu Oct  8 02:13:15 UTC 2015 - Stopping openerp-server: 
Thu Oct  8 02:13:15 UTC 2015 - Starting openerp-server: 
JedMeister commented 8 years ago

So this happens even with the initscript handling the PID? Or has this emerged since you changed that?

Perhaps you could make the initscript check the socket? Or maybe easier; make the 'restart" function do a 'stop', then a loop that waits until the process is no longer running (i.e. the PID no longer exists) before it does start. Obviously then you'd need to use 'restart' in your scripts rather than 'start' and 'stop'...

DocCyblade commented 8 years ago

This was before the pid

I am thinking looking at the time stamps that the stop start did not wait long enough. In this case then the wait was sleep 5 might need 10-15

JedMeister commented 8 years ago

It's probably better to check whether the process is running or not. Obviously sleeping works but IMO it's a bit hacky. I'm totally ok with using it in a conf.d build script (which only runs at build time) but not super enthused about using it in an initscript or even an inithook (which will only run once for most users).

It is either potentially unreliable or wasteful... E.g. it may be unreliable on a really low spec machine; or if a VM has access to host's realtime clock - it may see the 10 seconds pass by but not have 10 seconds of cpu processing time to take care of it. Or it may be wasteful on a high spec fast machine which actually can kill it much faster but the user still needs to wait... And 15 secs feels like a long time if you're waiting...

FWIW this is another resulting issue of a synchronous init system (i.e. systemd)...

Having said all that; it is a preference rather than a requirement. Also for future reference it may be worth investigating a proper systemd initscript as I imagine that systemd must have some mechanism for dealing with this (I vaguely recall reading something somewhere that said by default on start/restart it checks whether the daemon is running or not before starting - but maybe I remember wrong...). However that won't resolve this for v14.x as our LXC builds require removal of systemd (and re-installation of sysvinit).

DocCyblade commented 8 years ago

@JedMeister - Good point. On closer testing, I found the issue is with the restart command on the init script. It seems since this is async, it's being called many times and causing the pid to some how be removed, and on top of that, Odoo is not done shutting down and it's getting a command to startup, then failing as well.

Thinking some "lock" file that when the script starts gets created so that if called many times will just exit 0 and the file is removed once script is done.

DocCyblade commented 8 years ago

@JedMeister I fixed the pid file issue by doing what we talked about

mkdir /var/run/odoo
chmod 755 /var/run/odoo
chown openerp:openerp /var/run/odoo

Odoo does remove the pid when finished, so I could use the pid file as a lock file as long as I check for the pid to see if it exists, and it it does not remove it.?

Did some googling and found this

[ -x $DAEMON ] || exit 0
[ -f $CONFIGFILE ] || exit 0

checkpid() {
    [ -f $PIDFILE ] || return 1
    pid=`cat $PIDFILE`
    [ -d /proc/$pid ] && return 0
    return 1
}

I am not sure what is being done in the [ ] I think it's like running test, not sure was the || exit 0 is doing or when it's being executed. I think I can use this in some way to detect the pid, if it exist and the pid exist move on, else exit. Not sure what the returns

DocCyblade commented 8 years ago

Ok I think I know what's going on

The brackets are definitely test

The double pipe is a on failure do this A double & is a on success do this

Do I under stand this right? If so then the code able makes sense to me now. The function will return 0 if the pid exists and the process exists as well, otherwise it returns 1

JedMeister commented 8 years ago

Yep [ is an alias for the test command and the ] is required syntax to make it look pretty! :stuck_out_tongue:

-x tests to see if the file is executable -d tests to see if it's a directory (or a symlink to a directory) also in the script you posted above -f tests if it's a file (or a symlink to a file)

&& is AND - || is OR (essentially your description is right)

DocCyblade commented 8 years ago

Well said

With this info I should be able to use this do a check

DocCyblade commented 8 years ago

46 should solve this once implemented