Closed fitnycdigitalinitiatives closed 3 years ago
Hi @fitnycdigitalinitiatives. How does the service fail for you? For me the archivematica-storage-service
service (Systemd) remains active when MySQL is not available, and the underlying Gunicorn workers log connection errors, additionally when I try to access from the browser I see the "Whoops!" message. But as soon as the database is up again the problem disappears.
This has been working well for us because in some setups the Storage Service is deployed on a different host. For the packages, where often all services run within the same host, I guess we could try After
but then we'd still have to deal with that window where the database service is up but the application isn't quite ready yet to handle database requests from clients. There are ways to work around that but I'm not sure we really need it. I'd be curious to hear your thoughts.
Hello @sevein,
My memory is a little foggy on this, but the failure was specifically when my system would reboot. And so the process was that the storage service was trying to start up but it occurred before the mariadb.service was started and thus would fail. To recreate this, you could try disabling mariadb entirely and then restarting the storage service. (I'm in a centos/rhel environment)
Here's what I get when I do that:
Oct 07 09:56:22 archivematica gunicorn[2973]: return Connection(*args, **kwargs) Oct 07 09:56:22 archivematica gunicorn[2973]: File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python3.6/site-packages/MySQLdb...n __init__ Oct 07 09:56:22 archivematica gunicorn[2973]: super(Connection, self).__init__(*args, **kwargs2) **Oct 07 09:56:22 archivematica gunicorn[2973]: django.db.utils.OperationalError: (2002, "Can't connect to local MySQL server through socket '/var/lib/mysql...ock' (2)")** Oct 07 09:56:22 archivematica gunicorn[2973]: [2021-10-07 06:56:22 -0700] [2976] [INFO] Worker exiting (pid: 2976) Oct 07 09:56:22 archivematica gunicorn[2973]: [2021-10-07 09:56:22 -0400] [2973] [INFO] Shutting down: Master Oct 07 09:56:22 archivematica gunicorn[2973]: [2021-10-07 09:56:22 -0400] [2973] [INFO] Reason: Worker failed to boot.
It doesn't seem the storage service will start if it can't connect to mariadb. If the storage server is already running, it will keep running if the mariadb connection is lost and then you get the 'Whoops' but if it's not there to begin with it fails entirely.
The reason that I specifically used the After
configuration to fix this is because I saw that it was used in the archivematica-mcp-server.service unit file:
[Unit]
Description=Archivematica MCP Server Service
After=syslog.target network.target mariadb.service
[Service]
Type=simple
User=archivematica
EnvironmentFile=/etc/sysconfig/archivematica-mcp-server
ExecStart=/usr/share/archivematica/virtualenvs/archivematica/bin/python /usr/lib/archivematica/MCPServer/archivematicaMCP.py
[Install]
WantedBy=multi-user.target
So it seemed logical to also have the storage service unit file also include this because it now depends on mariadb to be running just as the mcp server does.
As to how this would work if the storage server was on a separate server from mariadb, I'm not really sure how that would work. I guess you would have the same issue if the mcp server was running on a separate server from mariadb as well.
Thanks, Joseph! Thank you for the level of detail you provided. What do you think of this approach? https://github.com/artefactual-labs/am-packbuild/pull/320/files. Restart
is what we are using in other services (see example) and it's working well. Not as sophisticated as service dependencies, but it would work well on distributed setups.
Awesome, looks great.
The only thing I might add is that I think you could still also include the After
for mariadb in addition to what you have done and it would be ok for environments with separate servers because, and this is entirely based on this stack exchange comment, the service will still start if mariadb isn't being started at all. That is, if I'm understanding it correctly, After
isn't dependent. If mariadb is being started at the same time, it will sort it first, but in situations where mariadb isn't also being started, like in separate servers, it will just ignore it.
...but that's entirely conjecture on my part :neutral_face:
That's very smart, will use it then!
Expected behaviour When the system reboots, the storage service should start up.
Current behaviour The storage service fails at reboot in v 1.13. Because the storage service uses mariadb rather than mysqlite in 1.13, the storage service will fail if it loads before mariadb. This can be fixed by adding mariadb.service to the After directive in the unit file for the service. That file itself doesn't seem to be in the SS repository, otherwise I would create a pull request.
Steps to reproduce Reboot system in v 1.13 and if the storage service loads before mariadb it will fail.
Your environment (version of Archivematica, operating system, other relevant details) Redhat 7
For Artefactual use:
Before you close this issue, you must check off the following: