SeattleTestbed / softwareupdater

Software updater daemon
MIT License
1 stars 6 forks source link

software updater restart doesn't wait long enough for new software updater to start #29

Closed choksi81 closed 10 years ago

choksi81 commented 10 years ago

While investigating #756, it appears that when the software updater is restarted, the old software updater doesn't always wait long enough for the new one to start. The wait time is one minute, but the new software updater doesn't signal that it's up and running until after the initialization process (which is largely pointless at the moment: #554). If this initialization time is more than a minute, the old software updater writes a stop file for the newly started one and the old software updater continues and will try to restart the software updater again.

Here's an example of a new software updater doing the downloads in its initialization process, which end up taking more than 90 seconds in this case:

1258628893.11:PID-2935:[Downloading file vessel.restrictions because it
doesn't already exist at download.test/vessel.restrictions
...
1258628986.96:PID-2935:[software_updater_start](do_rsync]) There's a stop file. Exiting.

In that same time the following is logged (unfortunately in this case, it was logged to a separate file, see #766):

1258628889.22:PID-31276:[Attempting to restart software updater.
1258628949.88:PID-31276:[restart_software_updater](restart_software_updater]) Failed to restart software
updater. This instance will continue.

This series of events continuously repeats and, if the initialization is always slow, a successful restart will never happen.

The simple solution would seem to be to increase the restart wait time from 1 minute to something much higher such as 20 minutes (that may sound like a lot, but what if this is a really, really slow system?).

Additionally, addressing #554 would be a good idea especially if the plan ends up being to just eliminate the pointless check (which would really speed things up).

choksi81 commented 10 years ago

Author: jsamuel This is no longer an issue with a 1-minute wait now that #554 was addressed by removing the code from init() that does download and signature checking. However, I added a comment in r3219 just to remind developers to keep this issue in mind.