goodrobots / maverick-api

API backend for maverick-web
MIT License
8 stars 5 forks source link

support maverick commands #202

Open SamuelDudley opened 4 years ago

SamuelDudley commented 4 years ago

control services

maverick start .. maverick stop .. maverick status ..

fnoop commented 4 years ago

Note that maverick start .. etc are just wrappers for systemctl. It prefixes 'maverick-' to each systemd manifest name, so: maverick status nginx actually refers to the system service 'maverick-nginx'. the actual system command that is then run is: sudo systemctl status maverick-nginx All the maverick manifests are in /etc/systemd/system, and are all prefixed with 'maverick-'

The same kind of thing happens with maverick start .. and maverick stop ... The maverick script itself (~/software/maverick/bin/maverick) is the reference.

Given that, it might be better for python/-api to use systemd directly using pythonic modules: https://github.com/mosquito/cysystemd https://github.com/systemd/python-systemd

fnoop commented 4 years ago

Also a lot of the maverick services are controlled through puppet, ie through localconf.json. So as well as starting/stopping services, they should also be set in localconf.json to persist the change. Or maybe that's not always wanted. Also note there are two different levels of service activation:

SamuelDudley commented 4 years ago

re interacting with systemd directly, that was my plan where possible. Last time I had a look at options this one was at the top of my list: https://github.com/facebookincubator/pystemd

fnoop commented 4 years ago

Ooh interesting. Can we guarantee all the platforms will have dbus?

fnoop commented 4 years ago

Also note that maverick status will show all running status. It used to be that all the services were manually maintained in the bin/maverick script, but now there is a parsed file hierarchy in ~/software/maverick/bin/status.d/. Each directory in their represents a 'section', and each file within each directory represents a service within that section.

SamuelDudley commented 4 years ago

mav@dev:~/software/python/bin$ sudo python3.7 <-- must be sudo :(

from pystemd.systemd1 import Unit
import glob
glob.glob('/etc/systemd/system/maverick*')

unit = Unit(b'maverick-webdev.service')
unit.load()

unit.Service.GetProcesses()

unit.Unit.Names [b'maverick-webdev.service']

unit.Unit.ActiveState b'active'

unit.Unit.SubState b'running'

unit.Service.GetProcesses() [(b'/system.slice/maverick-webdev.service', 4917, b'node /usr/bin/yarn run serve'), (b'/system.slice/maverick-webdev.service', 4949, b'/bin/sh -c vue-cli-service serve'), (b'/system.slice/maverick-webdev.service', 4950, b'/usr/bin/node /srv/maverick/code/maverick-web/node_modules/.bin/vue-cli-service serve')]

unit.Unit.Stop('replace')

unit.Unit.ActiveState b'failed'

unit.Unit.SubState b'failed'

unit.Unit.Start('replace')

hmmmm

not sure if this is worth it... we have to run -api as sudo and we cant enable / disable services... just stick to subprocess...

fnoop commented 4 years ago

Interesting OK, we'll hit this problem with whatever we use to control systemd. There are ways we can provide sufficient granular root access to a process. Alternatively we can subprocess out to the maverick command, but it should be noted that the maverick command (well OK let's be honest the entire mav user) has unrestricted root access. Which really should be changed at some point. There is a parameter that controls this: base::users::mav_sudopass. And it's of course less efficient and pythonic. Perhaps we could create a microservice that runs as root that just deals with systemd - that would make @cglusky incredibly happy.

SamuelDudley commented 4 years ago

I like the idea of a microservice here. I'll put something together. Preferences on the communication? Rest, graphql, something else? I'm leaning towards a simple rest interface that we can await on using async calls for -api.

fnoop commented 4 years ago

If we used dbus, we could use the permissions manager in that instead, which is more flexible: https://github.com/facebookincubator/pystemd

But then that means depending on dbus, which isn't always a given in non-desktop environments. Probably not a good idea.

fnoop commented 4 years ago

Looks like systemd requires dbus..

One option is to add a PolicyKit pkla:

[dev] [root@maverick-raspberrylite ~]# cat /etc/polkit-1/localauthority/50-local.d/org.freedesktop.systemd1.pkla
[Allow user mav to run systemctl commands]
Identity=unix-user:mav
Action=org.freedesktop.systemd1.manage-units
ResultInactive=yes
ResultActive=yes
ResultAny=yes

This allows us to start/stop/restart (any) services as the mav user: [dev] [mav@maverick-raspberrylite /usr/share/polkit-1/actions]$ systemctl restart maverick-visiond

Unfortunately it doesn't allow us to specify which services can be controlled (allows all system services), and also doesn't allow us to enable/disable services, which requires root filesystem access (to change the symlinks in /etc/systemd/system/multi-user.target.wants/):

[dev] [mav@maverick-raspberrylite /usr/share/polkit-1/actions]$ systemctl disable maverick-visiond
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-unit-files ===
Authentication is required to manage system service or unit files.
Authenticating as: root
Password:

But it's an option - the enable/disable could be done separately through maverick configure which always runs as root.

fnoop commented 4 years ago

Another option is to create user systemd manifests, by putting them in ~/.config/systemd/user. This is possibly a cleaner way of doing it, although might be a bit more confusing for people used to looking for them in /etc/systemd/system.

This allows control through the mav user without any escalated privileges, however not sure if it allows processes to be run as root, which some of the mav processes do.

fnoop commented 4 years ago

Using subprocess and just calling maverick [command] [service] or sudo systemctl [command] service] and creating specific sudo rules for maverick-* services may well just be the easier thing to do in the short term. It's a bit messier, but it's an awful lot simpler as it's already all working, doesn't require separate python modules/code, doesn't require a microservice, doesn't require redoing the privilege escalation.

SamuelDudley commented 4 years ago

refactored the async process runner here: https://github.com/goodrobots/maverick-api/blob/7ef29f4283e618320710c825c3b21513ab62bb96/maverick_api/modules/base/util/process_runner.py

For services we only care about the return value which is returned by the complete_callback() lifecycle hook.

SamuelDudley commented 4 years ago

Still need to add parsing of the service structure but queries, mutations and subscriptions in place. https://github.com/goodrobots/maverick-api/blob/master/maverick_api/modules/api/maverick/maverick_service.py

SamuelDudley commented 4 years ago

I have added a query that will get the current status of all services and return it in a big list http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickServiceList%20%7B%0A%20%20services%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D%7D

the order of the list is as per the directory structure. You will see there are a few more fields populated with info that I lifted from the directory structure e.g. category, displayCategory, displayName and updateTime note that it is a very expensive query for the server as we are getting the status and if its enabled for all services

You can subscribe to updates using: http://dev.maverick.one:7000/graphiql?query=%0A%0Asubscription%7B%0A%20%20MaverickService%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D

You can query individual services, or service categories individual service: http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickService(name%3A%22api%40dev%22)%20%7B%0A%20%20name%0A%20%20displayName%0A%20%20category%0A%20%20displayCategory%0A%20%20enabled%0A%20%20running%0A%20%20updateTime%0A%7D%7D%0A

whole category: http://dev.maverick.one:7000/graphiql?query=%0A%0Aquery%7BMaverickService(category%3A%22dev%22)%20%7B%0A%20%20name%0A%20%20displayName%0A%20%20category%0A%20%20displayCategory%0A%20%20enabled%0A%20%20running%0A%20%20updateTime%0A%7D%7D%0A note that the queries don't return anything, but you catch the output in the subscription

modification of a service: http://dev.maverick.one:7000/graphiql?query=mutation%20%7B%0A%20%20MaverickService(name%3A%20%22api%40fc%22%2C%20enabled%3A%20false)%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D%0A

modification of a whole category: http://dev.maverick.one:7000/graphiql?query=mutation%20%7B%0A%20%20MaverickService(category%3A%20%22fc%22%2C%20enabled%3A%20false)%20%7B%0A%20%20%20%20name%0A%20%20%7D%0A%7D%0A

My recommendation is to do a one off MaverickServiceList query and store the results in vuex. On page load, create a subscription to MaverickService and update the vuex state using the data that comes in from the subscription. For control you can submit MaverickService mutations with the appropriate name or category argument populated, along with the desired modification.

SamuelDudley commented 4 years ago

@fnoop re using pystemd

Setup: sudo apt install libsystemd-dev pip install pystemd

I have not been able to find a way to attach callbacks to systemd. The following finds and then polls each of the maverick-* services approx once every few seconds (polling faster causes systemd to chew CPU cycles). If the status is changed a callback is ran. I can start this in a background process from -api and use it to internally mutate the service state.

import time
from pystemd.systemd1 import Manager

class ServiceStatus:
    __slots__ = ["_name", "_running", "_enabled", "update_time", "callback"]
    def __init__(self, name, callback = None):
        self._name = name
        self._running = b"unknown"
        self._enabled = b"unknown"
        self.callback = callback
        self.update_time = time.time()

    @property
    def name(self):
        return self._name[:self._name.find(b".service")].decode()

    @property
    def running(self):
        if self._running == b'running':
            return True
        elif self._running == b'unknown':
            return None
        else:
            return False

    @property
    def enabled(self):
        if self._enabled == b'enabled':
            return True
        elif self._enabled == b'unknown':
            return None
        else:
            return False

    def update_running(self, running = b'unknown'):
        if self._running == running:
            pass
        else:
            self._running = running
            self.update_time = time.time()
            if self.callback:
                self.callback(self)

    def update_enabled(self, enabled = b'unknown'):
        if self._enabled == enabled:
            pass
        else:
            self._enabled = enabled
            self.update_time = time.time()
            if self.callback:
                self.callback(self)

    def __eq__(self, other):
        if not isinstance(other, ServiceStatus):
            return False
        return self._name == other._name and self._running == other._running and self._enabled == other._enabled

    def __hash__(self):
        return hash((self._name, self._running, self._enabled))

def list_units():
    with Manager() as manager:
        print("Version", manager.Manager.Version)
        print("Architecture", manager.Manager.Architecture)

        services = {}
        while 1:
            time.sleep(1) # backoff the systemd calls to spare CPU cycles
            for unit_path, enabled_status in manager.Manager.ListUnitFilesByPatterns(["enabled", "disabled"], ["maverick-*"]):
                time.sleep(0.2)
                idx = unit_path.find(b"maverick-")
                if not idx:
                    continue

                sevice_name = unit_path[idx:]
                if sevice_name not in services:
                    ss = ServiceStatus(sevice_name, callback=some_callback)
                    ss.update_enabled(enabled=enabled_status)
                    services[sevice_name] = ss
                    continue

                services[sevice_name].update_enabled(enabled=enabled_status)

            for unit in manager.Manager.ListUnitsByNames(services.keys()):
                time.sleep(0.2)
                services[unit[0]].update_running(running=unit[4])

def some_callback(service):
    print(service.name)
    print(f"running:{service.running}")
    print(f"enabled:{service.enabled}")
    print()

list_units()

Any thoughts?

SamuelDudley commented 4 years ago

What the above does not do is get the status of indirect services such as api@dev... Once again, not sure how much better / more robust this would be than having a bunch of process calls going in the background that update the service status periodically.

fnoop commented 4 years ago

Do the callbacks in https://github.com/facebookincubator/pystemd/blob/master/examples/monitor_all_units_from_signal.py do us for start/stop service? Does it work for indirect services (apparently called 'template units')?

I would be less bothered about enabled/disabled, as we can set that in config and depend on a maverick run to commit. It would be great to bypass puppet if we can do it directly of course, but doesn't have to be a top priority.

SamuelDudley commented 4 years ago

Yes they do pick up the changes in start / stop of template units...

This is what I have at the moment:

import pystemd
import asyncio
import tornado.ioloop
import functools
from pprint import pprint

from pystemd.dbuslib import DBus
from pystemd.systemd1 import Unit

unit_data = [
    ("maverick-visiond.service", "visiond"),
    ("maverick-api@dev.service", "api@dev"),
]

def process(msg, error=None, userdata=None):
    msg.process_reply(True)
    print("*" * 80)
    # print(" * New message:\n")
    # pprint({k: v for k, v in msg.headers.items() if v is not None})
    member = msg.headers.get("Member", None)

    if member == b"PropertiesChanged":
        # pprint(msg.body[1])
        unit_path = msg.headers.get("Path", None)
        sub_state = msg.body[1].get(b"SubState", None)
        timestamp_monotonic = msg.body[1].get(b"StateChangeTimestampMonotonic", None)
        if unit_path and sub_state:
            if unit_path in userdata:
                print(userdata[unit_path]["name"])
                print(sub_state)
                print(timestamp_monotonic)
    elif member == b"UnitFilesChanged":
        print("An enable / disable event occured")
    else:
        pass
    print("#" * 80)
    print("\n")

def read_callback(fd, event, **kwargs):
    kwargs["bus"].process()

async def monitor(event, units):
    loop = tornado.ioloop.IOLoop.current()

    with DBus() as bus:
        for unit_data in units.values():
            bus.match_signal(
                unit_data["unit"].destination,
                unit_data["unit"].path,
                b"org.freedesktop.DBus.Properties",
                b"PropertiesChanged",
                process,  # callback for this message
                units,
            )
        bus.match_signal(
            b"org.freedesktop.systemd1",
            None,
            None,
            b"UnitFilesChanged",
            process,  # callback for this message
            units,
        )
        loop.add_handler(
            bus.get_fd(), functools.partial(read_callback, bus=bus), loop.READ
        )
        await event.wait()

loop = tornado.ioloop.IOLoop.current()

units = {}
for unit_service_name, name in unit_data:
    unit = Unit(unit_service_name)
    units[unit.path] = {"unit": unit, "name": name}

event = asyncio.Event()
loop.add_callback(functools.partial(monitor, event, units))
loop.start()

we can use the existing code to populate the following with all the services we care about

unit_data = [
    ("maverick-visiond.service", "visiond"),
    ("maverick-api@dev.service", "api@dev"),
]

This will allow us to update the running state of services without making subprocess calls.

The code above also detects that something has been enabled / disabled in the system, but as I don't have a nice way of knowing what service was changed, we need to check all of the services that we are interested in using the existing subprocess call.

At least then the subprocess calls are limited to a change event (rather than a dumb loop)

fnoop commented 4 years ago

Great, that's really cool! So we just need to work out how to automatically populate unit_data somehow. systemd/psystemd must have a way to query a list of all services? Otherwise we'll have to keep updating -api everytime we add/remove a service, and it won't know about any template unit instances that the user adds (for example the user can add any number of sitl/ros/mavros/-api instances)

SamuelDudley commented 4 years ago

Currently I'm parsing the directory structure that the maverick status command uses to determine what services to show ~/software/maverick/bin/status.d/ .

If we ignore that I can get all the maverick- services using the Manager interface, but it doesn't list instances of template units e.g. it gives maverick-api@.service but doesn't give us dev or fc.

So we could have maverick update the directory structure when a user adds a sitl/ros/mavros/-api instance... Or keep trying to get them via pystemd

SamuelDudley commented 4 years ago

okay, we can get all the maverick- services with the following code

    def read_services_dbus(self):
        services = []
        with Manager() as manager:
            for _unit in manager.Manager.ListUnits():
                idx = _unit[0].find(b"maverick-")
                if idx == -1: 
                    continue
                services.append({
                    "unit": Unit(_unit[0]), 
                    "category_name": None,
                    "category_display_name": None,
                    "command": _unit[0][0:-8].decode(),
                    "service_name": _unit[0][9:-8].decode(),
                    "service_display_name": _unit[1].decode(),
                    "category_index": None,
                    "service_index": None,
                    "last_update": None,
                    "enabled": None,
                    "running": True if _unit[4] == b'running' else False,
                })
        for service in services:
            application_log.debug(f"Adding service: {service}")

I'm re-writing the services module to use the dbus code. I wont have time to work on it much this week.

SamuelDudley commented 4 years ago

dbus code is in... https://github.com/goodrobots/maverick-api/blob/master/maverick_api/modules/api/maverick/maverick_service.py

note that the hot reload of the api server causes the dbus code to stop watching for events... need to add custom logic to unload this module

SamuelDudley commented 4 years ago

need to support systems without dbus

[I 2020-03-19,11:04:31 /home/nova/code/maverick-api/maverick_api/modules/__init__.py:138]
    Appending schema from maverick_api.modules.api.maverick.MaverickServiceSchema
Traceback (most recent call last):
  File "maverick-api.py", line 41, in <module>
    api.initialize()
  File "/home/nova/code/maverick-api/maverick_api/modules/base/apiserver/__init__.py", line 47, in initialize
    generate_schema()
  File "/home/nova/code/maverick-api/maverick_api/modules/__init__.py", line 139, in generate_schema
    _module_schema[ref_name] = attribute()
  File "/home/nova/code/maverick-api/maverick_api/modules/api/maverick/maverick_service.py", line 87, in __init__
    self.services = self.read_services_dbus()
  File "/home/nova/code/maverick-api/maverick_api/modules/api/maverick/maverick_service.py", line 170, in read_services_dbus
    with Manager() as manager:
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 31, in __enter__
    self.load()
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 89, in load
    unit_xml = self.get_introspect_xml()
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 73, in get_introspect_xml
    with self.bus_context() as bus:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/nova/code/maverick-api/.venv/lib/python3.8/site-packages/pystemd/base.py", line 64, in bus_context
    bus.open()
  File "pystemd/dbuslib.pyx", line 303, in pystemd.dbuslib.DBus.open
pystemd.dbusexc.DBusFileNotFoundError: [err -2]: Could not open a bus to DBus
fnoop commented 4 years ago

What system is that? When I did a bit of research it appeared that dbus is basically a dependency of systemd, except in unusual circumstances.

SamuelDudley commented 4 years ago

windows subsystem for linux (I use it for testing)

fnoop commented 4 years ago

Ah never ever considered running it on windows :) This indicates dbus is present, or at least possible: https://www.novaspirit.com/2016/08/19/ubuntu-bash-windows-10-dbus-error-fix/

fnoop commented 4 years ago

@SamuelDudley - access to -api seems to have changed, eg: https://dev.maverick.one:6700/graphiql?query=mutation%20%7B%0A%20%20MaverickService(name%3A%20%22api%40fc%22%2C%20enabled%3A%20false)%20%7B%0A%20%20%20%20name%0A%20%20%20%20displayName%0A%20%20%20%20category%0A%20%20%20%20displayCategory%0A%20%20%20%20enabled%0A%20%20%20%20running%0A%20%20%20%20updateTime%0A%20%20%7D%0A%7D%0A

SamuelDudley commented 4 years ago

There is a development configuration option that needs to be set to True, otherwise graphiql returns a 403

https://github.com/goodrobots/maverick-api/blob/master/maverick_api/config/maverick-api.conf.sample#L26

fnoop commented 4 years ago

Stopping (or starting) a single service a single time, we get 2 subscription updates:

Screenshot 2020-03-28 at 16 47 10

SamuelDudley commented 4 years ago

That's because I'm treating all dbus statuses that are not 'running' as 'dead'. Transitional states between dead an running will cause a subscription update but the service will be reported as dead. If the additional subscription updates are an issue we can:

fnoop commented 4 years ago

What are the transitional states? If we can't see them in the message and they're not useful to the front end then ideally we shouldn't see them, just the final change to live, or a final change to dead.

SamuelDudley commented 4 years ago

I can't recall the exact names but they are effectively reporting starting and stopping states. My concern was that unless the service is in that final state of running, for all purposes it's still dead and hence I was reporting it as such.