Make a service abstraction layer

copumpkin commented 7 years ago

We've talked about this for years, usually in the context of people hating or wanting to replace systemd, but I think it's time to start talking about it in other less negative contexts as well.

Motivation

Nixpkgs now works pretty well on Darwin, and the nix-darwin effort by @LnL7 can manage services using launchd. Unfortunately, even though 99% of the logic could be shared with NixOS services, it can't currently be.

Furthermore, people are taking more and more infrastructure to distributed container scheduling systems like Kubernetes, ECS, or Nomad. Although in some cases people might run the host on NixOS, I think Nix can provide an equally compelling story for the container guests. @lethalman's dockerTools is a promising start to that, but just like nix-darwin, anyone using it has to reinvent the configuration for their services because NixOS service modules can't be reused.

Finally, assorted work like declarative user environments, replacing the init system (please don't turn this thread into another systemd flamewar; we have plenty of other venues for that), and service testing on virtualized platforms (I can't run any of the NixOS VM tests on EC2, Azure, Google Cloud, and so on) all could benefit heavily from such a layer.

Prior work

The major effort so far has been https://github.com/NixOS/nixpkgs/pull/5246 and it's an impressive feat but has long bitrotted, and @offlinehacker does not have the time to pick it up again.

Key goals

Should be possible to evaluate/build on non-NixOS systems, and ideally non-Linux ones (with a suitable back-end)
Shouldn't lose the nice functionality we enjoy from systemd. More generally, it shouldn't prevent us from using system-specific idiosyncrasies that don't exist in other systems. If we see some of them being used repeatedly all over the place, we eventually factor them out into the SAL.
Shouldn't get overly complicated or generalized. Let's not try to plan for every eventuality and just tackle low-hanging fruit for now.
Don't get mired in getting something perfect all at once. 1000-line+ PRs rarely get merged or even reviewed, and bitrot easily. I'm convinced we can do this incrementally and see if we like it on one or two simple services with small PRs, and then start working outwards from there.

Proposed approach

I'm not actually going to propose a technical approach here. Rather, I'd like to propose how we approach figuring out how to implement it.

Go through all of our service modules and categorize how many of them use each key in systemd.services.<name>.<key> and how.
Write the thinnest layer that covers the two or three most commonly used service keys and translates its config to the current systemd.services machinery
Write a simple "dump to text file" backend for the SAL that lists enabled services in an arbitrary piece of config, and builds on a non-Linux platform
In separate PRs, start switching over small numbers of our simplest services to use the new SAL machinery, possibly leaving some systemd.services config behind to merge with the config that SAL sets
Start work on more interesting backends like a launchd one and a docker one, etc.

I expect most services don't do much beyond setting ExecStart/script and some environment, possibly with a default wantedBy = [ "multi-user.target" ] which basically means "please run my unit" in most cases. Many services probably want a way to express that they need networking, and some might want to run as a separate user. Eventually we'll start running into services that depend on other services, and I propose not trying to tackle that at first.

cc @shlevy @offlinehacker @edolstra @globin @LnL7

P.S: this isn't an RFC because I'm not actually saying what to do, but just what I want and how I think we should go about making it happen

edolstra commented 7 years ago

I'm skeptical about having a service abstraction layer.

I suppose the goal is to move from systemd.services.* back to a generic jobs.* option. However, this only works for services defined in NixOS modules, not for upstream units. This would cause us to lose one of the biggest advantages of systemd, namely the ability to use upstream units directly. (In the Upstart days, IIRC, we had to define all Upstart services in NixOS modules, so there was no reuse of upstream job files. Not that there were many upstream job files...)
An abstraction layer would condemn us to the least common denominator of init systems, which is very small. For example, systemd and Upstart have completely different dependency systems, so having dependencies between services would be out. Startup notification, cgroups, etc. - i.e. all the features that make systemd so much better in keeping track of services - would be out. Likewise for socket activation, resource control, sensible logging, etc. etc. Some of these could be implemented on other systems, but cgroups in particular would probably be impossible to emulate on OS X. And without cgroups, you can't reliably keep track of processes so service scripts quickly become littered with killall hacks.
Even if most services are currently simple (i.e. use ExecStart etc.), moving to an abstraction layer imposes a cost in that people will be discouraged from improving a service by using systemd-specific features (since that would make the service no longer portable to other init systems). So for instance, if somebody creates a PR to add socket activation to some service S, that PR might be rejected because it would cause S to no longer work on OS X.

copumpkin commented 7 years ago

An abstraction layer would condemn us to the least common denominator of init systems, which is very small. For example, systemd and Upstart have completely different dependency systems,

@edolstra I explicitly addressed what I think we should do about idiosyncratic features in my proposal. Did you not see that? I'm very explicitly saying we should not do what you just said would be a problem.

I'm also curious what you think we should do about the problem. I know you use NixOS a lot but it feels like a pity to lose out on all the goodness if we're using other platforms.

edolstra commented 7 years ago

I did see that, but I don't think it will work in practice. For example:

Alice adds a service that uses the abstraction layer.
A while later, Bob improves the service by adding socket activation, but this breaks the service on systems that don't provide socket activation.
Bob's improvement gets reverted because people were relying on the service working on those systems.

Now, maybe, somebody at this point steps up to add socket activation to the abstraction layer (and to all supported backends). But I don't think we should count on that. (Also, it won't work for all the non-portable stuff like cgroups.)

copumpkin commented 7 years ago

Your objections make sense; but how about turning the problem on its head a bit? The main thing I want to be able to share is the module config schema and how it plugs into service execution. I don't care as much if someone needs to write two or three different config sections per service to make it work on all backends. How ridiculous would this be?

{ config, lib, pkgs, ... }:

let
  cfg = config.myservice;

  configFile = builtins.toFile (builtins.toJSON { port = cfg.port; });

  startScript = ''
    ${pkgs.foo}/bin/foo -c ${configFile}
  '';
in {
  options = {
     enable = lib.mkOption { ... };
     port = lib.mkOption { ... };
     # You get the idea
   };

  config = lib.mkIf cfg.enable {
    systemd.services.foo = {
      serviceConfig.PrivateTmp = true;
      script = startScript;
   };
   docker.services.foo = {
     volumes = { "/data" = {}; };
     script = startScript;
   };
   launchd.services.foo = {
    launchdOptions.StartOnMount = true;
    script = startScript;
   };
  };
}

Obviously details would vary, but this would allow individual service modules to factor out common stuff (config generation, possibly start commands, environment variables, etc.) across the different launch systems, but we wouldn't make any attempt to abstract over common options between them. Laziness means we don't evaluate config we don't use, so e.g., asking for a docker container out of a particular service won't force the systemd config and vice versa. Eventually we might realize that many of those systemd/docker/launchd triplets look the same and might factor them into some common functions, but we wouldn't be forced to decide that ahead of time.

I can sketch something like this out quickly in a PR if you'd like something more concrete to munch on.

7c6f434c commented 7 years ago

I will join as a user of NixOS services on a a system with NixPkgs kernel but without systemd.

In many situations nixosInstance.config.systemd.services.[name].runner is a script that can be used to start the service, and the target service management solution can just.
In case where service can be convinced to put configs into /etc/, nixosInstance.config.environment.etc.[filename] gives you the config; the launcher script is usually simple anyway.

Given that we often need to patch the build system to make sure it uses $PATH correctly, and that we try to move a lot of configs out of /etc/, I expect that upstream systemd configs will require an interesting amount of patching. And I think that adding alternative service definitions in this case (maintained by whoever needs them) seems no less reasonable than having multiple boost versions in NixPkgs.

I do hope that we could eventually have service.configs or something where the configs are kept, because a lot of service definitions do not leave an easy way to access their configs in a simple programmatic way, and I know of no guildelines on how to provide such access in NixOS. I really hope that not being able to list all the configs a service definition generates is not desirable for most services, so such export shouldn't limit desirable improvements, and experience shows that config generation is most of the work anyway unless the service definition is completely trivial.

edolstra commented 7 years ago

@copumpkin Yeah that sounds pretty good to me!

We'll probably want to have variants of module-list.nix for various environments, in order to filter out services that don't support particular environments.

vcunat commented 7 years ago

We added meta.maintainers to services, so why not meta.platforms? EDIT: I see the platforms here will be of a different kind; anyway that's all an unimportant nitpick from me ATM.

edolstra commented 7 years ago

@7c6f434c I don't think upstream systemd units typically need a lot of patching. Also, they're extensible without patching. E.g. you can set systemd.services.foo.path while still using the upstream foo unit. (Also Exec* directives require absolute paths, so systemd units tend to rely less on $PATH.)

Regarding "mainained by whoever needs them" tends not to work that way in practice. Somebody will do a PR to add (say) OS X support to the Postgresql module, and then it becomes everybody else's responsibility to keep it working. (I.e. if somebody changes something to the module that breaks it on OS X, then that person will get blamed, even though that person might have no way to test on OS X.)

@vcunat Maybe, though that requires parsing/evaluating potentially a lot of files that are not usable on a particular platform.

copumpkin commented 7 years ago

Yeah, perhaps a module-level meta would make sense (also for module-level documentation, maintainership, etc.). Then module-list could just enumerate and filter. That's complicated slightly by modules usually being functions and needing to be knotted before being able to query the meta, but it's not awful.

7c6f434c commented 7 years ago

@edolstra Re: «mainained by whoever needs them» — it doesn't work that way in general, but specifically in the case where the split is caused by using an upstream unit, there is a better chance, because from the mainline NixOS side there is no reason to touch the forked version anyway… Well, if the config is shared, the systemd-specific part will be inside ifs.

But the only thing I actively want is config access.

LnL7 commented 7 years ago

@copumpkin I think that approach makes the most sense, this makes the difference very explicit and nix has primitives to define common options in a nice way. For example my modules for launchd also support some options like script and path so for simple cases the config could be reused.

{ config, lib, pkgs, ... }:
with lib;
let
  cfg = services.foo;
  service =
    { script = "${pkgs.foo}/bin/food";
      path = [ pkgs.hello ];
    };
in {
  options = {
    services.foo.enable = mkOption {};
  };

  config = mkIf cfg.enable {
    systemd.services.foo = service;
    launchd.services.foo = service;
    docker.services.foo.container = "foo";
  };
}

Something else that I've been wondering about is how this would work for darwin and regular linux systems without accidentally binging in activation scripts from the NixOS core modules, etc. What I've done for nix-darwin is to copy or reimplement common options instead of importing them.

copumpkin commented 7 years ago

@LnL7 I think laziness mostly avoids bringing in any config we don't ask for. I'll sketch out a very simple proof of concept a bit later in a PR and see if I run into any issues.

In a sentence, ultimately all config reduces to one or two "entry points", like system.build; we'd probably just add separate entry points for Darwin/docker/etc.

LnL7 commented 7 years ago

Yeah, that last comment is probably a little bit out of the scope of this issue.

offlinehacker commented 7 years ago

I think service abstraction was abstracted very well by kubernetes, and as such kubernetes could be used for a reference. Currently I'm very much satisfied with kubernetes, but I'm still missing a cluster+container aware Linux distribution. Anything in between will be just abstracting with too little information to be usable on anything but single node systems, and then we are back on systemd/launchd/docker-compose.

Here is just a short list of what abstraction should handle:

services requirements(packages, mounts)
hardware requirements(CPU, CPU resources, physical disks, USB, PCI, ...)
security requirements(capabilites, apparmor, seccomp, exposed paths, ...)
network requirements(ports, sockets, network policy, ...)
storage requirements(storage size, iops, read-only,/readwrite)
secrets
scheduling requirements
cross service dependencies (which services some service depends on, which exposed resources) ...

Kubernetes support most of these, and it's very easy to ignore unsupported requirements on some simpler service managers like systemd. While I'm not saying all this should be added at beginning, I'm saying reference design should be good enough to support all this. While I'm not good in writing rfc-s, and I've not figured out how to solve everything just yet, I just want to share what would be a good implementation, that will be usable on highly profiled systems that we are deploying today. With simpler abstractions it's hard to support more complex systems. I rather like current (systemd) implementation, as half done implementation design we will have to live with.

As alternative approach I like @LnL7 idea the most, as it's easy to support multiple abstractions at the same time, I can see several issues related to different option being used by different abstractions, but well I think it's something we can live with until general abstraction is created.

On Wed, 24 May 2017, 20:50 Michael Raskin, notifications@github.com wrote:

@edolstra https://github.com/edolstra Re: «mainained by whoever needs them» — it doesn't work that way in general, but specifically in the case where the split is caused by using an upstream unit, there is a better chance, because from the mainline NixOS side there is no reason to touch the forked version anyway… Well, if the config is shared, the systemd-specific part will be inside ifs.

But the only thing I actively want is config access.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/26067#issuecomment-303817359, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjvS26MxnjjCc17YDJYG5r_vHSo0WAGks5r9HwQgaJpZM4NlQxI .

shlevy commented 7 years ago

So, since I was tagged, my 2 cents are that the NixOS module system is annoying and uncomposable, and the solution here is probably just plain functions, choosing different ones with the same high level interface for different backends. In the case of "just run this program with these args", there of course can be a common helper function interface. In more complex cases, we only have as many backends as can support the interface, or possibly have graceful degredation for cases where we can't.

copumpkin commented 7 years ago

Here's a sketch of the idea I proposed: https://github.com/NixOS/nixpkgs/pull/26075

@shlevy I partially agree but the merging behavior is nice (feels a bit like AOP in some cases) and I don't want to reinvent everything just to get some basic reuse 😄

7c6f434c commented 7 years ago

@shlevy if there is config export, we can just use separate NixOS instances for every service, then it behaves as a pure function…

mmahut commented 5 years ago

Are there any updates on this issue, please

wmertens commented 5 years ago

This issue predates the RFC process. I propose a champion (not me, although I think it would be great to have for Darwin, Docker, WSL etc) pours the key points of this issue into a RFC and we can get traction on it that way.

stale[bot] commented 4 years ago

Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:

Search for maintainers and people that previously touched the related code and @ mention them in a comment.
Ask on the NixOS Discourse. 3. Ask on the #nixos channel on irc.freenode.net.

rien commented 4 years ago

This is still important to me. I would love to use a simpler init system with NixOS.

wmertens commented 4 years ago

I like the idea of taking the most capable system (kubernetes apparently) and adding wrappers or ignoring features in other systems.

So keep the systemd config layer, but add a more general one that mostly translates directly to systemd configurations.

Then, if you want to have an init.d type system, you would basically run all the scripts once, and output a bunch of warnings about unsupported features. The system would not restart crashed services or handle log rotation until someone comes up with a wrapper for those. On the plus side, the closure would be tiny and it would work in containers.

In fact, those general settings could get their defaults from the current systemd settings, so most modules would work as-is.

lpolzer-enercity commented 4 years ago

+1 here, this is one thing that still keeps me from considering to adopt NixOS.

wmertens commented 4 years ago

I think the best way forward would be to create an RFC.

Is there anyone that wants to champion this?

abathur commented 4 years ago

@wmertens I don't know enough about different init systems to evaluate how well it grapples with some of the challenges Eelco mentioned, but IIRC a Nix newsletter early this year featured a blog post (https://sandervanderburg.blogspot.com/2020/02/a-declarative-process-manager-agnostic.html) by @svanderburg about the project https://github.com/svanderburg/nix-processmgmt, which seems to have made substantive progress on the core concepts, here?

wmertens commented 4 years ago

@abathur I hadn't seen that, it looks impressive! Seem like @svanderburg found a nice middle ground between service-specific and service-agnostic.

I wonder how far along this is to be able to boot e.g. nixos-minimal with a different service manager.

wmertens commented 4 years ago

Don't forget to watch Sander present nix-processmgmt https://cfp.nixcon.org/nixcon2020/talk/TW79FU/ :)

martin-braun commented 1 year ago

I'd also enjoy to use NixOS without systemd, I'm rather interested to build a minimal system that uses openrc instead.

ThisNekoGuy commented 1 year ago

I've recently been curious about NixOS but, similarly, I'd rather have s6 and/or OpenRC as well :/

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-nixbsd-a-posibility/29612/16

abathur commented 1 year ago

:)

https://github.com/NixOS/rfcs/pull/163

NixOS / nixpkgs