Open SamuelDudley opened 4 years ago
Note that maverick start ..
etc are just wrappers for systemctl. It prefixes 'maverick-' to each systemd manifest name, so:
maverick status nginx
actually refers to the system service 'maverick-nginx'. the actual system command that is then run is:
sudo systemctl status maverick-nginx
All the maverick manifests are in /etc/systemd/system, and are all prefixed with 'maverick-'
The same kind of thing happens with maverick start ..
and maverick stop ..
. The maverick script itself (~/software/maverick/bin/maverick) is the reference.
Given that, it might be better for python/-api to use systemd directly using pythonic modules: https://github.com/mosquito/cysystemd https://github.com/systemd/python-systemd
Also a lot of the maverick services are controlled through puppet, ie through localconf.json. So as well as starting/stopping services, they should also be set in localconf.json to persist the change. Or maybe that's not always wanted. Also note there are two different levels of service activation:
re interacting with systemd directly, that was my plan where possible. Last time I had a look at options this one was at the top of my list: https://github.com/facebookincubator/pystemd
Ooh interesting. Can we guarantee all the platforms will have dbus?
Also note that maverick status
will show all running status. It used to be that all the services were manually maintained in the bin/maverick script, but now there is a parsed file hierarchy in ~/software/maverick/bin/status.d/. Each directory in their represents a 'section', and each file within each directory represents a service within that section.
mav@dev:~/software/python/bin$ sudo python3.7
<-- must be sudo :(
from pystemd.systemd1 import Unit
import glob
glob.glob('/etc/systemd/system/maverick*')
unit = Unit(b'maverick-webdev.service')
unit.load()
unit.Service.GetProcesses()
unit.Unit.Names [b'maverick-webdev.service']
unit.Unit.ActiveState b'active'
unit.Unit.SubState b'running'
unit.Service.GetProcesses() [(b'/system.slice/maverick-webdev.service', 4917, b'node /usr/bin/yarn run serve'), (b'/system.slice/maverick-webdev.service', 4949, b'/bin/sh -c vue-cli-service serve'), (b'/system.slice/maverick-webdev.service', 4950, b'/usr/bin/node /srv/maverick/code/maverick-web/node_modules/.bin/vue-cli-service serve')]
unit.Unit.Stop('replace')
unit.Unit.ActiveState b'failed'
unit.Unit.SubState b'failed'
unit.Unit.Start('replace')
not sure if this is worth it... we have to run -api as sudo and we cant enable / disable services... just stick to subprocess...
Interesting OK, we'll hit this problem with whatever we use to control systemd. There are ways we can provide sufficient granular root access to a process.
Alternatively we can subprocess out to the maverick
command, but it should be noted that the maverick command (well OK let's be honest the entire mav user) has unrestricted root access. Which really should be changed at some point. There is a parameter that controls this: base::users::mav_sudopass
. And it's of course less efficient and pythonic.
Perhaps we could create a microservice that runs as root that just deals with systemd - that would make @cglusky incredibly happy.
I like the idea of a microservice here. I'll put something together. Preferences on the communication? Rest, graphql, something else? I'm leaning towards a simple rest interface that we can await on using async calls for -api.
If we used dbus, we could use the permissions manager in that instead, which is more flexible: https://github.com/facebookincubator/pystemd
But then that means depending on dbus, which isn't always a given in non-desktop environments. Probably not a good idea.
Looks like systemd requires dbus..
One option is to add a PolicyKit pkla:
[dev] [root@maverick-raspberrylite ~]# cat /etc/polkit-1/localauthority/50-local.d/org.freedesktop.systemd1.pkla
[Allow user mav to run systemctl commands]
Identity=unix-user:mav
Action=org.freedesktop.systemd1.manage-units
ResultInactive=yes
ResultActive=yes
ResultAny=yes
This allows us to start/stop/restart (any) services as the mav user:
[dev] [mav@maverick-raspberrylite /usr/share/polkit-1/actions]$ systemctl restart maverick-visiond
Unfortunately it doesn't allow us to specify which services can be controlled (allows all system services), and also doesn't allow us to enable/disable services, which requires root filesystem access (to change the symlinks in /etc/systemd/system/multi-user.target.wants/):
[dev] [mav@maverick-raspberrylite /usr/share/polkit-1/actions]$ systemctl disable maverick-visiond
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-unit-files ===
Authentication is required to manage system service or unit files.
Authenticating as: root
Password:
But it's an option - the enable/disable could be done separately through maverick configure
which always runs as root.
Another option is to create user systemd manifests, by putting them in ~/.config/systemd/user. This is possibly a cleaner way of doing it, although might be a bit more confusing for people used to looking for them in /etc/systemd/system.
This allows control through the mav user without any escalated privileges, however not sure if it allows processes to be run as root, which some of the mav processes do.
Using subprocess and just calling maverick [command] [service]
or sudo systemctl [command] service]
and creating specific sudo rules for maverick-* services may well just be the easier thing to do in the short term. It's a bit messier, but it's an awful lot simpler as it's already all working, doesn't require separate python modules/code, doesn't require a microservice, doesn't require redoing the privilege escalation.
refactored the async process runner here: https://github.com/goodrobots/maverick-api/blob/7ef29f4283e618320710c825c3b21513ab62bb96/maverick_api/modules/base/util/process_runner.py
For services we only care about the return value which is returned by the complete_callback()
lifecycle hook.
Still need to add parsing of the service structure but queries, mutations and subscriptions in place. https://github.com/goodrobots/maverick-api/blob/master/maverick_api/modules/api/maverick/maverick_service.py
I have added a query that will get the current status of all services and return it in a big list http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickServiceList%20%7B%0A%20%20services%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D%7D
the order of the list is as per the directory structure. You will see there are a few more fields populated with info that I lifted from the directory structure
e.g. category
, displayCategory
, displayName
and updateTime
note that it is a very expensive query for the server as we are getting the status and if its enabled for all services
You can subscribe to updates using: http://dev.maverick.one:7000/graphiql?query=%0A%0Asubscription%7B%0A%20%20MaverickService%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D
You can query individual services, or service categories individual service: http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickService(name%3A%22api%40dev%22)%20%7B%0A%20%20name%0A%20%20displayName%0A%20%20category%0A%20%20displayCategory%0A%20%20enabled%0A%20%20running%0A%20%20updateTime%0A%7D%7D%0A
whole category: http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickService(category%3A%22dev%22)%20%7B%0A%20%20name%0A%20%20displayName%0A%20%20category%0A%20%20displayCategory%0A%20%20enabled%0A%20%20running%0A%20%20updateTime%0A%7D%7D%0A note that the queries don't return anything, but you catch the output in the subscription
modification of a whole category: http://dev.maverick.one:7000/graphiql?query=mutation%20%7B%0A%20%20MaverickService(category%3A%20%22fc%22%2C%20enabled%3A%20false)%20%7B%0A%20%20%20%20name%0A%20%20%7D%0A%7D%0A
My recommendation is to do a one off MaverickServiceList
query and store the results in vuex.
On page load, create a subscription to MaverickService
and update the vuex state using the data that comes in from the subscription.
For control you can submit MaverickService
mutations with the appropriate name
or category
argument populated, along with the desired modification.
@fnoop re using pystemd
Setup:
sudo apt install libsystemd-dev
pip install pystemd
I have not been able to find a way to attach callbacks to systemd. The following finds and then polls each of the maverick-*
services approx once every few seconds (polling faster causes systemd to chew CPU cycles). If the status is changed a callback is ran.
I can start this in a background process from -api and use it to internally mutate the service state.
import time
from pystemd.systemd1 import Manager
class ServiceStatus:
__slots__ = ["_name", "_running", "_enabled", "update_time", "callback"]
def __init__(self, name, callback = None):
self._name = name
self._running = b"unknown"
self._enabled = b"unknown"
self.callback = callback
self.update_time = time.time()
@property
def name(self):
return self._name[:self._name.find(b".service")].decode()
@property
def running(self):
if self._running == b'running':
return True
elif self._running == b'unknown':
return None
else:
return False
@property
def enabled(self):
if self._enabled == b'enabled':
return True
elif self._enabled == b'unknown':
return None
else:
return False
def update_running(self, running = b'unknown'):
if self._running == running:
pass
else:
self._running = running
self.update_time = time.time()
if self.callback:
self.callback(self)
def update_enabled(self, enabled = b'unknown'):
if self._enabled == enabled:
pass
else:
self._enabled = enabled
self.update_time = time.time()
if self.callback:
self.callback(self)
def __eq__(self, other):
if not isinstance(other, ServiceStatus):
return False
return self._name == other._name and self._running == other._running and self._enabled == other._enabled
def __hash__(self):
return hash((self._name, self._running, self._enabled))
def list_units():
with Manager() as manager:
print("Version", manager.Manager.Version)
print("Architecture", manager.Manager.Architecture)
services = {}
while 1:
time.sleep(1) # backoff the systemd calls to spare CPU cycles
for unit_path, enabled_status in manager.Manager.ListUnitFilesByPatterns(["enabled", "disabled"], ["maverick-*"]):
time.sleep(0.2)
idx = unit_path.find(b"maverick-")
if not idx:
continue
sevice_name = unit_path[idx:]
if sevice_name not in services:
ss = ServiceStatus(sevice_name, callback=some_callback)
ss.update_enabled(enabled=enabled_status)
services[sevice_name] = ss
continue
services[sevice_name].update_enabled(enabled=enabled_status)
for unit in manager.Manager.ListUnitsByNames(services.keys()):
time.sleep(0.2)
services[unit[0]].update_running(running=unit[4])
def some_callback(service):
print(service.name)
print(f"running:{service.running}")
print(f"enabled:{service.enabled}")
print()
list_units()
Any thoughts?
What the above does not do is get the status of indirect
services such as api@dev
... Once again, not sure how much better / more robust this would be than having a bunch of process calls going in the background that update the service status periodically.
Do the callbacks in https://github.com/facebookincubator/pystemd/blob/master/examples/monitor_all_units_from_signal.py do us for start/stop service? Does it work for indirect services (apparently called 'template units')?
I would be less bothered about enabled/disabled, as we can set that in config and depend on a maverick run to commit. It would be great to bypass puppet if we can do it directly of course, but doesn't have to be a top priority.
Yes they do pick up the changes in start / stop of template units...
This is what I have at the moment:
import pystemd
import asyncio
import tornado.ioloop
import functools
from pprint import pprint
from pystemd.dbuslib import DBus
from pystemd.systemd1 import Unit
unit_data = [
("maverick-visiond.service", "visiond"),
("maverick-api@dev.service", "api@dev"),
]
def process(msg, error=None, userdata=None):
msg.process_reply(True)
print("*" * 80)
# print(" * New message:\n")
# pprint({k: v for k, v in msg.headers.items() if v is not None})
member = msg.headers.get("Member", None)
if member == b"PropertiesChanged":
# pprint(msg.body[1])
unit_path = msg.headers.get("Path", None)
sub_state = msg.body[1].get(b"SubState", None)
timestamp_monotonic = msg.body[1].get(b"StateChangeTimestampMonotonic", None)
if unit_path and sub_state:
if unit_path in userdata:
print(userdata[unit_path]["name"])
print(sub_state)
print(timestamp_monotonic)
elif member == b"UnitFilesChanged":
print("An enable / disable event occured")
else:
pass
print("#" * 80)
print("\n")
def read_callback(fd, event, **kwargs):
kwargs["bus"].process()
async def monitor(event, units):
loop = tornado.ioloop.IOLoop.current()
with DBus() as bus:
for unit_data in units.values():
bus.match_signal(
unit_data["unit"].destination,
unit_data["unit"].path,
b"org.freedesktop.DBus.Properties",
b"PropertiesChanged",
process, # callback for this message
units,
)
bus.match_signal(
b"org.freedesktop.systemd1",
None,
None,
b"UnitFilesChanged",
process, # callback for this message
units,
)
loop.add_handler(
bus.get_fd(), functools.partial(read_callback, bus=bus), loop.READ
)
await event.wait()
loop = tornado.ioloop.IOLoop.current()
units = {}
for unit_service_name, name in unit_data:
unit = Unit(unit_service_name)
units[unit.path] = {"unit": unit, "name": name}
event = asyncio.Event()
loop.add_callback(functools.partial(monitor, event, units))
loop.start()
we can use the existing code to populate the following with all the services we care about
unit_data = [
("maverick-visiond.service", "visiond"),
("maverick-api@dev.service", "api@dev"),
]
This will allow us to update the running state of services without making subprocess calls.
The code above also detects that something has been enabled / disabled in the system, but as I don't have a nice way of knowing what service was changed, we need to check all of the services that we are interested in using the existing subprocess call.
At least then the subprocess calls are limited to a change event (rather than a dumb loop)
Great, that's really cool! So we just need to work out how to automatically populate unit_data somehow. systemd/psystemd must have a way to query a list of all services? Otherwise we'll have to keep updating -api everytime we add/remove a service, and it won't know about any template unit instances that the user adds (for example the user can add any number of sitl/ros/mavros/-api instances)
Currently I'm parsing the directory structure that the maverick status command uses to determine what services to show ~/software/maverick/bin/status.d/ .
If we ignore that I can get all the maverick- services using the Manager interface, but it doesn't list instances of template units e.g. it gives maverick-api@.service but doesn't give us dev or fc.
So we could have maverick update the directory structure when a user adds a sitl/ros/mavros/-api instance... Or keep trying to get them via pystemd
okay, we can get all the maverick- services with the following code
def read_services_dbus(self):
services = []
with Manager() as manager:
for _unit in manager.Manager.ListUnits():
idx = _unit[0].find(b"maverick-")
if idx == -1:
continue
services.append({
"unit": Unit(_unit[0]),
"category_name": None,
"category_display_name": None,
"command": _unit[0][0:-8].decode(),
"service_name": _unit[0][9:-8].decode(),
"service_display_name": _unit[1].decode(),
"category_index": None,
"service_index": None,
"last_update": None,
"enabled": None,
"running": True if _unit[4] == b'running' else False,
})
for service in services:
application_log.debug(f"Adding service: {service}")
I'm re-writing the services module to use the dbus code. I wont have time to work on it much this week.
dbus code is in... https://github.com/goodrobots/maverick-api/blob/master/maverick_api/modules/api/maverick/maverick_service.py
note that the hot reload of the api server causes the dbus code to stop watching for events... need to add custom logic to unload this module
need to support systems without dbus
[I 2020-03-19,11:04:31 /home/nova/code/maverick-api/maverick_api/modules/__init__.py:138]
Appending schema from maverick_api.modules.api.maverick.MaverickServiceSchema
Traceback (most recent call last):
File "maverick-api.py", line 41, in <module>
api.initialize()
File "/home/nova/code/maverick-api/maverick_api/modules/base/apiserver/__init__.py", line 47, in initialize
generate_schema()
File "/home/nova/code/maverick-api/maverick_api/modules/__init__.py", line 139, in generate_schema
_module_schema[ref_name] = attribute()
File "/home/nova/code/maverick-api/maverick_api/modules/api/maverick/maverick_service.py", line 87, in __init__
self.services = self.read_services_dbus()
File "/home/nova/code/maverick-api/maverick_api/modules/api/maverick/maverick_service.py", line 170, in read_services_dbus
with Manager() as manager:
File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 31, in __enter__
self.load()
File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 89, in load
unit_xml = self.get_introspect_xml()
File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 73, in get_introspect_xml
with self.bus_context() as bus:
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 64, in bus_context
bus.open()
File "pystemd/dbuslib.pyx", line 303, in pystemd.dbuslib.DBus.open
pystemd.dbusexc.DBusFileNotFoundError: [err -2]: Could not open a bus to DBus
What system is that? When I did a bit of research it appeared that dbus is basically a dependency of systemd, except in unusual circumstances.
windows subsystem for linux (I use it for testing)
Ah never ever considered running it on windows :) This indicates dbus is present, or at least possible: https://www.novaspirit.com/2016/08/19/ubuntu-bash-windows-10-dbus-error-fix/
@SamuelDudley - access to -api seems to have changed, eg: https://dev.maverick.one:6700/graphiql?query=mutation%20%7B%0A%20%20MaverickService(name%3A%20%22api%40fc%22%2C%20enabled%3A%20false)%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D%0A
There is a development configuration option that needs to be set to True, otherwise graphiql returns a 403
Stopping (or starting) a single service a single time, we get 2 subscription updates:
That's because I'm treating all dbus statuses that are not 'running' as 'dead'. Transitional states between dead an running will cause a subscription update but the service will be reported as dead. If the additional subscription updates are an issue we can:
What are the transitional states? If we can't see them in the message and they're not useful to the front end then ideally we shouldn't see them, just the final change to live, or a final change to dead.
I can't recall the exact names but they are effectively reporting starting and stopping states. My concern was that unless the service is in that final state of running, for all purposes it's still dead and hence I was reporting it as such.
control services
maverick start ..
maverick stop ..
maverick status ..