e-valuation / EvaP

a university course evaluation system written in Python using Django
Other
96 stars 146 forks source link

Package EvaP (with nix?) #2191

Open niklasmohrin opened 1 month ago

niklasmohrin commented 1 month ago

Over the last year, we have had in-person and online (#2095, #1726, #2086) discussions about the development setup. I think the problems usually boil down to "VirtualBox / Docker doesn't work on my system / behaves differently on my system / makes this feature annoying to implement because of the sandboxing". In particular, we found that Docker does not mount directories consistently across platforms (and Vagrant does not make any extra effort to fix this) and found ourselves fixing this inside of the container. Attempts to use alternative setups have mixed success (I think we currently have a docker rootless and a libvirtd setup around?) and people sometimes demand that they want to run EvaP directly on their host (this came up for example when talking about #2077).

In #2095 we concluded that a simple docker-compose file for starting the main services is not enough for us, since we want to keep the experience of having a somewhat reproducable and isolated environment, close to production, setup for every contributor.

I feel that we could improve the experience for everyone if we sat down and tried to "package" EvaP rather than having it be defined by our implementation. This includes specifying

Then, there are things of slightly lesser importance which we implement in some way in our setup like

I am not saying we need to necessarily switch away from VirtualBox or Vagrant (although I suggest doing so later in this text). For starters, I think we should recognize that provision_vagrant_vm.sh has multiple steps, sometimes mixed a bit: First, do any kinds of system tweaks we need; then reset anything from previous runs (.bashrc, unmount folder maybe); then setup backup processes; then setup Python; then setup Node and other development stuff; then setup apache; then do the manage.py calls to initialize the databases etc.

Each of these steps has distinct importance and happens on a different level: We once said that we don't want vagrant up to fail, just because nvm does not work. On the other hand, I think we want to fail loudly if something goes wrong with postgres or during the Python setup. My suggestion is that each of the steps above should be separate, possibly as one script on its own and the top level script runs some of them with || exit 1 and others normally. Additionally, some are run as the evap user, so we don't have sudo -u evap ... more than once in a row. This way, users could in theory swap out particular steps of the setup while keeping others.

Now, none of this solves the "docker did not mount the folder with the correct permissions" problems. However, I feel that if we would have a more contained package, we could more easily recommend just rolling alternative development environment.


One approach I have been looking into for a while to tackle my discontent with the current setup is packaging EvaP with nix. I have some proof of concept at https://github.com/niklasmohrin/EvaP/tree/poetry2nix (see https://github.com/niklasmohrin/EvaP/blob/4a76ded8e6f038d9cee812f470d80c93fbeaf442/nix-setup.md for a possible version of the README instructions). Through nix, we would declaratively supply a) a reproducible environment with Python and dependencies for running EvaP in production (without apache), b) a reproducible development environment with Node, Chromium, shell completions etc. based on that, c) a reproducible setup of the databases behind EvaP, and possibly d) a configured apache for really running EvaP in production. All of this setup does not care about your file permissions or directory mounting, this is left to be done beforehand.

Because of this, the setup becomes, in general, more complicated, because there are more tools and options for what to do outside of what we offer as "the EvaP package". One particularly unfortunate downside is that the top-level tool, nix, does not support Windows directly and more steps are needed before getting to EvaP. However, I feel that WSL2 (what I recommend in the instructions) has become somewhat standard and is probably the most popular way to setup Linux environment on Windows anyways. I think most new contributors with Windows in the last couple of semesters had WSL in some way already. Linux and Mac have the choice of installing nix on their systems or running it in a VM / container (possibly even with vagrant).

For each platform, we could add some instructions for some VM / container setup we know works somewhat well. I would think WSL2 for Windows and podman for Linux and Macos. Mounting files should work more reliably than what we currently have, because podman runs, as far as I understand, as the current user and therefore cannot leave any files with different UID lying around on the host. The separation of VM and EvaP are automatic, because we only specify EvaP stuff and leave VM stuff up to the user.

As an escape hatch from nix, the implementation on my branch uses poetry anyways to resolve Python dependencies, so in a worst case scenario, users could use just that to develop EvaP.

If we use nix, we of course get some more benefits like automatically keeping contributors environments up to date (if we update flake.lock, they will update the next time they enter the environment) and sharing the environment with CI and production, and so on. For me, the most apparent advantage still is that we can bundle all configuration and setup into a single source of truth though.


So, what do you all think? Should we try to move in this direction? Should we split up provision_vagrant_vm and give our runnable documentation some more structure? Or should we go even further and explore nix as our central packaging and development tool? I would definitely love to hear what you think and if you would be interested in the nix setup or if there are other options. (I have also looked at https://devenv.sh/, but it didn't really fit with what I had in mind; one more thought I had: we create a container image with nix and some niceties so that the following works: podman create --volume ... ghcr.io/e-valuation/evap-development-box); but both of these are follow up questions after we have some consensus on the ground rules)

richardebeling commented 1 month ago

I'm a bit pessimistic.

Yes, there are multiple steps, and they might have different hackiness levels, but I think they're all real complexity, so we won't be able to reduce it. We can distribute it across files, but I'm not sure if the indirection would actually help. I don't usually feel cognitively overwhelmed by the provision_vagrant script, so indirection might just make it more complicated, idk.

"Undo"-Operations in the current script: Optimally, most of our setup steps would be idempotent. Most of them are (since most linux programs behave idempotently, e.g., apt-install, service start, ...)

if we would have a more contained package, we could more easily recommend just rolling [an?] alternative development environment.

You mean switching between VM providers? For anything more "alternative", we would need to have the alternative already prepared and defined, right?


I'm a bit afraid of using exotic tools. I'd say we're most likely to make it easy for new contributors if we use the most widespread tools (with all error messages having the corresponding solutions written down on StackOverflow). To me, it seems podman and nix are a bit niche. I'd be fine with WSL2 for the reasons you gave.

For similar reasons, I favor simple setups: The fewer installations required the better. "git+vagrant+provider" is pretty minimalistic, the setup instructions on your branch seem more complicated to me at first glance.

sharing the environment with CI and production

the most apparent advantage still is that we can bundle all configuration and setup into a single source of truth though.

I'm not so sure about this. A full production grade apache2 config is huge, it hassles with SSL certificates, cipher choices, rate limiting, setting up mod_wsgi, caching, logging, etc. This is sysadmin stuff. EvaP imho can/should not provide this stuff, an administrator setting up the software will have to jump through the hoops to do it correctly. I'd argue that a CI setup should be minimal for fast startup time. I wouldn't want every CI job to waste time and energy reloading redis when we're not using redis while running tests.

niklasmohrin commented 3 weeks ago

Okay, I am going to try to tackle it from another angle: Docker with vagrant, for me, and with the current provision script, has failed. The idea with vagrant is that we can easily spin up a satisfactory environment that behaves as we expect. Now, with different docker behaviors on the different platforms (and different installation methods), vagrant is not able to do this.

However, I am a big fan of using docker for evap development, and on my computer, the setup usually works. It is a lot faster on my machine to start and while running than with the virtualbox provider (I would assume mainly because there are no hardware limits configured). However, the idea that we could use vagrant to share this setup with everyone, did not work out.

Using Virtualbox, for me (and I believe others) is inappropriate when I just want to develop userspace software, not an entire stack. One might say that some of us are in fact sometimes developing an entire stack, but I think this does not outweigh the desire to just work on the EvaP process while not caring about the rest.

(Arguably, the vagrant + virtualbox setup has thus failed in the same way, because it does not universally provide a satisfactory development environment to everyone, because it is too slow. However, for the people that haven't moved to docker and are happily using virtualbox, there is no reason to force them to abandon that either - in the end, we should still be able to use the packaged evap in a VM as before, just with a different stack)


Yes, there are multiple steps, and they might have different hackiness levels, but I think they're all real complexity, so we won't be able to reduce it.

I think there is a incidental complexity in there. It's small things like DEBIAN_FRONTEND=noninteractive, but also setting up bindfs, copying ssh keys, and updating pip - this has nothing to do with evap. Between those there is crucial stuff like generating a fresh secret key!

We can distribute it across files, but I'm not sure if the indirection would actually help. I don't usually feel cognitively overwhelmed by the provision_vagrant script, so indirection might just make it more complicated, idk.

I wouldn't say that I am overwhelmed, but I do not enjoy reading or editing this script - and without set -e, I also always hate finding what broke after noticing that ./manage.py run does not work in the end. In my opinion, the script is too long and does too many things.

"Undo"-Operations in the current script: Optimally, most of our setup steps would be idempotent. Most of them are (since most linux programs behave idempotently, e.g., apt-install, service start, ...)

Yes agree - I have to add that it would be even better if things like apt would also be idempotent regardless of time, something that a locked nix config would solve.

You mean switching between VM providers? For anything more "alternative", we would need to have the alternative already prepared and defined, right?

I meant that with a more contained package, developers could more easily use setups that we have not thought about yet, exactly because our package is agnostic of the environment. The goal would be that when someone comes thinking "I want to use qemu", then they can make sure that they get postgres and redis running and then just follow our normal instructions.

I'm a bit afraid of using exotic tools. I'd say we're most likely to make it easy for new contributors if we use the most widespread tools (with all error messages having the corresponding solutions written down on StackOverflow). To me, it seems podman and nix are a bit niche. I'd be fine with WSL2 for the reasons you gave.

For similar reasons, I favor simple setups: The fewer installations required the better. "git+vagrant+provider" is pretty minimalistic, the setup instructions on your branch seem more complicated to me at first glance.

For me, vagrant is more exotic than podman to be honest, nowadays containers seem to have replaced VMs for most purposes. In fact, I can count on one hand the number of projects I know that have VM configurations for development. Now, this does not mean that having such a config is bad, quite the opposite, it is forthcoming to new contributors, but I think if the goal is to aid people who are new to this stuff, we might as well pick them up with tools that fit today's standard practices. We don't need to settle on podman by the way, it is just what I found to be fitting and would suggest as our "official" recommendation.

So what about nix? For me, clearly nix > apt+pip+npm¹, it has worked out really well over the last months and I honestly believe that there is much to gain from this setup. I have admitted that the suggested setup is more complicated, because it is. I think that exposing the setup of the VM more directly is a good move though, because it empowers contributors to adapt the setup to their situation - and if at the end they can run nix develop, they are still good to go.

On the host machine side, I would also argue that "git + (nix on host OR nix in whatever virtualization layer you like)" is more minimalistic and less "intrusive" than forcing vagrant. While nix is admittedly more "intrusive" than vagrant on its own, the new setup allows you to choose where in your virtualization stack you place nix - and to omit virtualization altogether (unless you are on Windows).

About being afraid: At the end of the day, I am not 100% sure that this is going to work either - after all, there have been numerous times where we thought the vagrant configs are now really ready for everyone. This is why I would this kind of "experimental" phase during which we have both setups in the repository and users can try both out. In this time, we can also nail out common pain points and work on simplifying the first time setup.

¹ "npm" not in "nix" should sound very enticing to everyone who was worked on the vagrant setup :D

I'm not so sure about this. A full production grade apache2 config is huge, it hassles with SSL certificates, cipher choices, rate limiting, setting up mod_wsgi, caching, logging, etc. This is sysadmin stuff. EvaP imho can/should not provide this stuff, an administrator setting up the software will have to jump through the hoops to do it correctly.

Okay, admittedly I only know very little about apache, so that was maybe too ambitious. I was hoping that we could use the occasion to share more of the apache config in the repo (also to close my knowledge gap), but that seems out of scope then. My point that apache should not be embedded in the evap package still stands though - I think we should separate it as much as we can from the rest of the code, optimally, evap would be equivalently difficult to set up with nginx as it is with apache after we have packaged it - maybe this is unpractical though, like I said, no idea about this admin stuff.

I'd argue that a CI setup should be minimal for fast startup time. I wouldn't want every CI job to waste time and energy reloading redis when we're not using redis while running tests.

That is easy to do though, we can have multiple configurations (with different features turned on or off) all revolving around the central piece which is evap.


@Kakadus, @janno42 I would also love to hear what you think about this :)

richardebeling commented 3 weeks ago

Hmm. I'd definitely be open to try a nix setup. To judge whether I think it's a simpler setup declaration/definition, I'd have to the see comparable set of files for nix.

(Minor detail: I don't like "running two things in parallel" because I think that's what caused us most of the issues with docker vs virtualbox, but there's probably no way around it)