KillingSpark / rustysd

A service manager that is able to run "traditional" systemd services, written in rust
MIT License
506 stars 15 forks source link

Redesign/Rewrite to move from the explorative code to a well designed system #35

Closed KillingSpark closed 4 years ago

KillingSpark commented 4 years ago

Currently rustysd is still pretty spaghetti, since I just started writing code to see what components are needed, and wanted to play around with this project. Now that pretty much all concepts work(tm) the existing bugs are likely caused by the bad (read: missing) design.

I will open a new branch that will contain a rewrite, and update this issue accordingly. Before I write any code in that branch, I will need to write a design concept.

Things that need to be particularly looked out for while designing:

  1. Separate 'static' info about units like names/config from the 'runtime' info like status, pid, open fds,...
  2. Locking. Mutexes are something I first started using when starting rustysd so I used them all over the place and by that enabled deadlocking to happen. This should be possible to work around by making rules in which order stuff has to be locked.
  3. Updating the set of units needs to happen 'atomically' so we need to be able to lock the whole runtimeinfo
  4. Make finding units and their inter-dependencies easier. It is currently very annoying and verbose to find all units that need each other either by name or implicit dependency. This probably means rustysd should keep track of the name-dependencies when adding/removing units.
  5. From the beginning, keep in mind that the set of units is neither static nor that units will keep existing if they existed once. This is one of the biggest issues in the current codebase.

Things that I will probably keep conceptually:

  1. The RuntimeInfo struct worked pretty well
  2. The FdStore is nice
  3. The whole config parsing stuff is ok. It needs to output different types though, since the info organization in the different sections is less than ideal
KillingSpark commented 4 years ago

This is happening in the redesign1 branch.

The document tracking the new design is here

zboldyga commented 4 years ago

@KillingSpark I am definitely itching to get into this project! That said, I work in healthcare and due to the pandemic I've been extremely busy the past month. I will jump back in as soon as I can :)

FYI I thought your design was really interesting when I was poking thru it before. Did you model the internals off of another project's approach?

KillingSpark commented 4 years ago

Oh! Yeah definitely focus your energy on that!

I was not really modelling anything, it just kinda fell into place while writing the thing. I modeled the base structures around how they are represented in the configs (which I am going to somewhat change in this rewrite though). But I am glad my code at least seems like I put some thought into the architecture beforehand :)

In the end the rewrite is not going to be as extensive as I thought, the overall structure is going to stay similar. Some of the *_table fields in the RuntimeInfo will be merged, and there will be less locks and especially less Arcs overall. This sounds like it's going to make stuff more complex, but I am convinced it will make locking easier and more manageable. We'll see how it goes.

KillingSpark commented 4 years ago

I started the rewrite. Notably still missing at commit 98a0a3cc114fc32e14f87c5807583d9474313e32:

KillingSpark commented 4 years ago

With 031eda1b7126ebc3952e1432cdf8b0850583282c many parts have been implemented.

Whats still missing is a nice API to change the unit sets. This is pretty hard to do right so I might postpone this and move it out of the scope of this redesign. The important goal of this was to reduce the amount of locking in code that does not need exclusive access to the static parts of units. Which was pretty successfull!

Also the redesign of the fork+exec has not yet happened. This is an issue I would loke to address in the master branch, since it only affects a relatively small part of the codebase.

Although the UnitStatus itself has been refactored, it still needs some more care. I would like to record the causes for erroneous states (e.g. the service is stopped because one of the prestart commands failed, and btw it was this command: "/bin/false") more detailed in the status. This needs some changes to the activation/deactivation calls to return appropriate and detailed errors. I think this should still be done in this branch before calling the redesign done.

KillingSpark commented 4 years ago

Also tests need to be rewritten.

Additionally with the rewrite some simple tests for correct Service state transition should be possible that only depend on simple utilities like /bin/false /bin/true /bin/sleep.

pwFoo commented 4 years ago

Hi @KillingSpark At the moment I paused my project where I use rustysd, but still following your progress with rustysd and will start playing with it if I have more time again! :+1:

KillingSpark commented 4 years ago

@pwFoo Its pretty cool to know that my project is actually interesting. That makes me happy (and motivates me to continue working on it) :)

KillingSpark commented 4 years ago

Also tests need to be rewritten.

Done.

KillingSpark commented 4 years ago

This redesign has finished. The tracking document has been moved to /doc/redesigns/lock_structure_redesign.md

I will open new issues with stuff I moved out of scope of this redesign as I come around to tackling them.