archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Storage Service service fails on reboot #1480

Closed fitnycdigitalinitiatives closed 3 years ago

fitnycdigitalinitiatives commented 3 years ago

Expected behaviour When the system reboots, the storage service should start up.

Current behaviour The storage service fails at reboot in v 1.13. Because the storage service uses mariadb rather than mysqlite in 1.13, the storage service will fail if it loads before mariadb. This can be fixed by adding mariadb.service to the After directive in the unit file for the service. That file itself doesn't seem to be in the SS repository, otherwise I would create a pull request.

Steps to reproduce Reboot system in v 1.13 and if the storage service loads before mariadb it will fail.

Your environment (version of Archivematica, operating system, other relevant details) Redhat 7


For Artefactual use:

Before you close this issue, you must check off the following:

sevein commented 3 years ago

Hi @fitnycdigitalinitiatives. How does the service fail for you? For me the archivematica-storage-service service (Systemd) remains active when MySQL is not available, and the underlying Gunicorn workers log connection errors, additionally when I try to access from the browser I see the "Whoops!" message. But as soon as the database is up again the problem disappears.

This has been working well for us because in some setups the Storage Service is deployed on a different host. For the packages, where often all services run within the same host, I guess we could try After but then we'd still have to deal with that window where the database service is up but the application isn't quite ready yet to handle database requests from clients. There are ways to work around that but I'm not sure we really need it. I'd be curious to hear your thoughts.

fitnycdigitalinitiatives commented 3 years ago

Hello @sevein,

My memory is a little foggy on this, but the failure was specifically when my system would reboot. And so the process was that the storage service was trying to start up but it occurred before the mariadb.service was started and thus would fail. To recreate this, you could try disabling mariadb entirely and then restarting the storage service. (I'm in a centos/rhel environment)

Here's what I get when I do that:

Oct 07 09:56:22 archivematica gunicorn[2973]: return Connection(*args, **kwargs) Oct 07 09:56:22 archivematica gunicorn[2973]: File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python3.6/site-packages/MySQLdb...n __init__ Oct 07 09:56:22 archivematica gunicorn[2973]: super(Connection, self).__init__(*args, **kwargs2) **Oct 07 09:56:22 archivematica gunicorn[2973]: django.db.utils.OperationalError: (2002, "Can't connect to local MySQL server through socket '/var/lib/mysql...ock' (2)")** Oct 07 09:56:22 archivematica gunicorn[2973]: [2021-10-07 06:56:22 -0700] [2976] [INFO] Worker exiting (pid: 2976) Oct 07 09:56:22 archivematica gunicorn[2973]: [2021-10-07 09:56:22 -0400] [2973] [INFO] Shutting down: Master Oct 07 09:56:22 archivematica gunicorn[2973]: [2021-10-07 09:56:22 -0400] [2973] [INFO] Reason: Worker failed to boot.

It doesn't seem the storage service will start if it can't connect to mariadb. If the storage server is already running, it will keep running if the mariadb connection is lost and then you get the 'Whoops' but if it's not there to begin with it fails entirely.

The reason that I specifically used the After configuration to fix this is because I saw that it was used in the archivematica-mcp-server.service unit file:

[Unit]
Description=Archivematica MCP Server Service
After=syslog.target network.target mariadb.service

[Service]
Type=simple
User=archivematica
EnvironmentFile=/etc/sysconfig/archivematica-mcp-server
ExecStart=/usr/share/archivematica/virtualenvs/archivematica/bin/python /usr/lib/archivematica/MCPServer/archivematicaMCP.py

[Install]
WantedBy=multi-user.target

So it seemed logical to also have the storage service unit file also include this because it now depends on mariadb to be running just as the mcp server does.

As to how this would work if the storage server was on a separate server from mariadb, I'm not really sure how that would work. I guess you would have the same issue if the mcp server was running on a separate server from mariadb as well.

sevein commented 3 years ago

Thanks, Joseph! Thank you for the level of detail you provided. What do you think of this approach? https://github.com/artefactual-labs/am-packbuild/pull/320/files. Restart is what we are using in other services (see example) and it's working well. Not as sophisticated as service dependencies, but it would work well on distributed setups.

fitnycdigitalinitiatives commented 3 years ago

Awesome, looks great.

The only thing I might add is that I think you could still also include the After for mariadb in addition to what you have done and it would be ok for environments with separate servers because, and this is entirely based on this stack exchange comment, the service will still start if mariadb isn't being started at all. That is, if I'm understanding it correctly, After isn't dependent. If mariadb is being started at the same time, it will sort it first, but in situations where mariadb isn't also being started, like in separate servers, it will just ignore it.

...but that's entirely conjecture on my part :neutral_face:

sevein commented 3 years ago

That's very smart, will use it then!