fedora-infra / bodhi

Bodhi is a web-system that facilitates the process of publishing updates for a Fedora-based software distribution.
https://bodhi.fedoraproject.org
GNU General Public License v2.0
151 stars 191 forks source link

Add an alternative standalone, container-based dev environment (and fix a few things exposed) #5565

Closed AdamWill closed 5 months ago

AdamWill commented 7 months ago

The current development environment requires four separate VMs (three from tinystage and one for Bodhi itself), plus some containers running inside the Bodhi VM. It's pretty heavy!

This provides an alternative development environment that is standalone and entirely based on containers: a postgres container, a waiverdb container, a greenwave container, a rabbitmq container, an ipsilon container, and a bodhi container that functions similarly to the bodhi VM from the existing environment. There is no FreeIPA or FAS backing the ipsilon instance, we just use ipsilon's testauth mode, with some configuration to allow for testing different group memberships.

The containers are Podman containers orchestrated by an ansible playbook, around which the bcd command is a light wrapper. They share a network namespace via a pod, and everything is accessed from the host as localhost, without SSL or Apache (these could be added if we really want, but it seemed unnecessary).

I suspect there are actually some issues in the current dev environment which I fixed in passing for this one: I don't think the way the docker-compose definition tries to launch the waiverdb and greenwave containers is valid any more, so I suspect they don't work.

I have tested this environment from a clean Fedora 39 host, it comes up in four minutes and works. Usage details are in the doc file.

AdamWill commented 7 months ago

This removes the docs for a virtualenv-based dev env because they are wildly outdated and I'm pretty sure you just can't really get a usable dev env just using a virtualenv any more, I did try for a bit and failed.

Outdated stuff about the initial implementation Some limitations and caveats about this environment are documented in the vagrant.rst file. To go into more detail: In order for the containers to network with each other we have to expose their ports to the host, at least, that's the only way I could get it to work. Theoretically we could put them all in a pod, but vagrant's docker provider plugin does not support pods (not surprisingly as it's meant to be for docker, not podman). I tried kinda hacking it up using a pre-up trigger to create the pod and `--pod` extra args when running the containers, but it doesn't work because the docker provider seems to be hardwired to try and map port 22 on each container when running it, and when using a pod you can't map ports at the container level, it has to be done at the pod level, so it just fails. Possibly I could make this work using custom networking configuration, but I have no experience doing that with vagrant and just didn't want to spend another day on it. I did try using an ipsilon container to provide a more sophisticated auth experience (based on the one the CI environment uses), but couldn't get that to fly either, problems with URLs redirecting in unexpected ways...again it's probably fixable, but I decided I'd probably spent too much time on this already. Because of the simple auth, anything authenticated doesn't work through the CLI client (which is hardwired to expect OIDC auth, it seems). Because this is, itself, a container-based environment, you can't run *more* containers in it, so you can't run the CI environment in it (so I disabled `bci`), and `bstartdeps` etc. are irrelevant (so I disabled those too). I tried to name the control variables for the ansible play 'functionally', but this does mean they imply possibilities I just haven't tested (like using the basic auth stuff in a VM-based environment) or I know don't currently work (like using FreeIPA auth in the container environment; we probably could make that work, but it doesn't seem super useful, as you still need the three tiny-stage VMs that way). The only configs that are really 'supported' are the ones expressed in the two Vagrantfiles.

I didn't bother setting up an httpd layer on top of the Bodhi instance and making it accessible as 'https://bodhi-dev.example.com' because it just seems kinda not very useful. It could only have a self-signed certificate since there's no FreeIPA issuer, so you'd just have to always trust that certificate, and...I just couldn't really see what value that would add. Again we could probably add that if someone wanted it, I just don't want to spend the time.

This PR includes a fix for a subtle bug that this dev env exposed, because it has to set base_address to http://localhost.localdomain:6543 in the dev env config, which is different from the value in the testing.ini that the test environment is supposed to use. Turns out the test process will actually use the values from the dev env config during early setup, then switch to using the values from testing.ini when the setup step for the first test runs. Full details are in the very long commit message for the fairly small change...

AdamWill commented 7 months ago

As for the "localhost" mystery, it seems...quite deep. Now I look, there are several places in the tests that rely on this rather nonsensical behaviour that, when things in ffmarkdown.py use request = pyramid.threadlocal.get_current_request() and then request.route_url, they wind up producing a URL which starts http://localhost/ , even though base_address is not http://localhost in any config we use.

At a quick glance it seems like this is because, in the request we get from pyramid.threadlocal.get_current_request(), the host for some reason is 'localhost:80'. Still, that just begs the question, why is that the case? I've no idea.

I do note that https://docs.pylonsproject.org/projects/pyramid/en/latest/api/threadlocal.html says "This function should be used extremely sparingly, usually only in unit testing code. It's almost always usually a mistake to use get_current_request outside a testing context", but we're using it in app code. I guess the reason is we want to avoid having to get an app instance into this markdown code somehow, but still.

AdamWill commented 7 months ago

Update: so...I think it might be because we're using WebTest to set up our test app, and that's just sort of how WebTest works? In real use the request will I guess have come from Apache or something and should have the 'real' public host and port for the instance. When you use webtest and just ask for a relative URL like this, it seems like the host and port will just always be localhost and 80, as best I can tell...

AdamWill commented 7 months ago

ugh, I did some more testing on this this morning and found various issues. i'm back to poking at networking! fun. marking as draft.

AdamWill commented 7 months ago

Oooh, okay, now it's much better! I hope, anyway. Worked out the networking, I think.

AdamWill commented 7 months ago

time vagrant up with this environment, after podman system reset to get a completely clean slate:

real    5m9.091s
user    1m39.174s
sys     0m19.905s

free -h with this env running: 31Gi total, 2.6Gi used, 20Gi free, 272Mi shared, 8.4Gi buff/cache, 27Gi available

edit note: those stats were from the vagrant container version. With the current ansible container version time to start from scratch is around 4 minutes, memory usage is around 3.1Gi.

By comparison, time vagrant up for tiny-stage:

real    29m19.333s
user    1m20.371s
sys     0m28.149s

and time vagrant up for the VM-based bodhi env after tiny-stage is up:

real    9m45.181s
user    0m28.031s
sys    0m9.601s

So getting a full VM-based env from scratch takes about 39 minutes. free -h with the VM env: 31Gi total, 9.8Gi used, 8.0Gi free, 1.6Mi shared, 13Gi buff/cache, 21Gi available. So that's about a 6-7G difference.

edit: and note after just vagrant up in the VM env you don't have greenwave or waiverdb, you have to vagrant ssh in and run bstartdeps to get them. That takes a few more minutes (and, currently, fails - I don't think my PR broke it, either). And the postgres container doesn't work.

AdamWill commented 7 months ago

Also, since I was kinda belatedly reminded by Richard Fontana that Vagrant isn't free any more (hah), I'll look at seeing if I can come up with a kinda standalone version of this, just using podman directly or something. There is an attempt at a F/OSS fork of vagrant at https://github.com/viagrunts/viagrunts , but it doesn't seem to be as professional or far along as the Vault or Terraform forks (I guess Vagrant is less popular).

mattiaverga commented 7 months ago

Yeah, I was aware about vagrant became not FOSS, so it will have to be dropped from Fedora. Also, the fork doesn't seems to have gained much momentum. My wishes are to get rid of everything vagrant based and just use podman and containers to set up a devel environment, but I haven't looked at it because I'm a newbie about podman and containers, so I'll need to document myself. Moreover, since this is going to be a problem for several Fedora apps, maybe it could be a task for CPE to migrate all apps from using vagrant to podman based dev envs... in a way that they could nicely interact themselves (tinystage, waiverdb, greenwave, bodhi, etc).

AdamWill commented 7 months ago

I learned quite a lot about containers doing this, so I'm going to keep fiddling with it :P

I think the most obvious initial candidate is podman-compose - in fact we're already using docker-compose inside the VM-based vagrant environment, to run the greenwave and waiverdb containers (I took that config as the basis for this). Edit: having spent a few minutes browsing the compose spec today I definitely think this is going to be the way to go.

What I'd like to manage is to combine the CI stuff and the dev env into one top-level command in the bodhi checkout - bcd, maybe - which can run the CI and/or an interactive dev environment, using the same container definitions as far as that's practical (and hopefully with the dev env using an Ipsilon container for auth, like the CI environment does, instead of the 'you're always ralph' auth this PR uses).

If we merge this PR we'd have three definitions of a greenwave container, for instance (one in this PR, one in the CI bits, and one inside the Vagrant Bodhi VM), which is obviously a bit silly. But I have to play with it a little to see if what I'd like to do is really possible in a usable and maintainable way. I'll probably start poking at that next week, I was going to do it today but decided to go hiking instead...:D

It would definitely be nice to have tinystage containerized, but I looked at it briefly before starting this and if you just look up running FreeIPA in a container, it seems like a lot of work, so I decided to start here instead :) Maybe by the time I've fiddled with this a bit longer I'll feel up to taking that on, though. IIRC, developing greenwave and waiverdb is already fairly easy, they're simpler and more pure Python-y apps than Bodhi and the FAS chain, so they're not as tough to work with.

Oh, the VM-based dev env is indeed rather broken too, btw (as I suspected when I ran across various things setting this up). I'll spend a few hours seeing if I can fix that on Monday.

AdamWill commented 7 months ago

progress report: podman-compose definitely is feeling like the way to go here. I have it kinda working already, fighting through the inevitable networking teething troubles ATM.

AdamWill commented 6 months ago

more update: okay, now I implemented it again in ansible-driven podman. kinda 50:50 on whether I like that or podman-compose more, but ansible is likely to be better supported for longer.

Outdated idea about container build approach I'm now fiddling with optimizing the actual dev container generation along the lines of [how greenwave does it](https://github.com/release-engineering/greenwave/blob/master/Dockerfile), using a 'builder' environment to produce a more stripped down final container. Tomorrow I'm gonna try a setup where I have the 'builder' environment produce a kind of 'core' image that just has the bodhi runtime deps in it (but no actual bodhi files), then produce two containers from that, a 'prod' container which has a specific version of the bodhi code added in, and a 'dev' container which has no bodhi code - you are expected to provide it as a volume, as that's what makes sense for a dev env, so you can live modify it - but has all the additional stuff we want in a dev environment (debugging tools and all the rest of it). potentially we could publish both of those to a registry and then the dev env ansible playbook would just have to run the dev container. right now it builds it on the fly.

I also want to take another shot at adding an ipsilon container, now I've had a bit more go-round with networking issues and CORS errors and stuff...

moar update: okay, quite happy with the ansible approach now, but it's not PR ready yet as my working tree is in a bit of a messy state, I still need to add some sugar (some kinda wrapper around the ansible commands), and I'm taking another shot at getting an ipsilon container working.

AdamWill commented 6 months ago

wew! I finally got to the point of pushing the ansible-based version, now called 'bcd'. It's possible the CI may fail on this, because the bcd environment shares its ipsilon container definition and messed with it a bit, but I wanted to push it so I can check it out on a clean machine and verify it works from scratch (and time it).

AdamWill commented 6 months ago

some notes, while I can still think of them...

There's probably more, I'll think about it.

AdamWill commented 6 months ago

I've just noticed there's an interesting race with group memberships. The first time you log in as a user who is in the database dump, you get their "real" group memberships applied to any actions you perform. If you then log out and back in as the same user, you get the fake group memberships from ipsilon testauth applied to any actions you perform. I suspect this might be a real bug in Bodhi - it might be the case that, if your groups change in FAS, the first time you log into Bodhi afterwards, the changes might not be reflected. I think perhaps I'll look into it more later.

edit: oh, but I forgot I buried the lede - I fixed authed CLI usage! whee. Turns out things go sideways with cookies if the server URL doesn't use an FQDN, so I changed it to localhost.localdomain.

AdamWill commented 6 months ago

Numbers for this version: it comes up from scratch on a clean system in just under four minutes, and uses 3.1G of RAM (the extra is because we have Ipsilon now).

AdamWill commented 6 months ago

applied most of @gotmax23 's suggestions, thanks.

gotmax23 commented 6 months ago

Cool, thanks, @AdamWill!

AdamWill commented 6 months ago

erf, this is a bit broken again rn, I am trying to refine the ipsilon stuff, please stand by MUZAK PLAYS

AdamWill commented 6 months ago

OK, should be working again now (let me know if not). Oh, I just need to update the docs - I changed the mechanism for testing group memberships out to one that I think is neater, you can now login as e.g. 'guest:groups=somegroup,othergroup' to log in as 'guest' but with your group membership reported as (only) being a member of somegroup and othergroup. So you can test pretty much arbitrary group membership scenarios this way. If you just login as 'guest' you get the default group memberships ("fedora-contributors", "packager", and a group with the same name as the user, so "guest" in that case).

AdamWill commented 6 months ago

Oh, and I sent the Ipsilon patches: https://pagure.io/ipsilon/pull-request/400

AdamWill commented 6 months ago

Tweaked bcd to have the common vagrant commands (up, halt, destroy) as aliases.

mattiaverga commented 6 months ago

I've finally had time to look at this. Maybe something has changed recently, because I cannot get bodhi to work properly in the container. After running ./bcd run trying to open http://localhost.localdomain:6543/ fails. Logs in bodhi container show that an error prevents bodhi from starting. Also, trying to run pytest within the bodhi container fails:

# pytest -x -v --no-cov --disable-warnings bodhi-messages/
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.0, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /bodhi, configfile: pyproject.toml
plugins: cov-3.0.0, mock-3.6.1
collected 0 items / 1 error                                                                                                                                                           

======================================================================================= ERRORS ========================================================================================
_________________________________________________________________ ERROR collecting bodhi-messages/tests/test_base.py __________________________________________________________________
ImportError while importing test module '/bodhi/bodhi-messages/tests/test_base.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
bodhi-messages/tests/test_base.py:24: in <module>
    from bodhi.messages.schemas import base
bodhi-messages/bodhi/messages/__init__.py:20: in <module>
    METADATA = importlib.metadata.metadata('bodhi-messages')
/usr/lib64/python3.10/importlib/metadata/__init__.py:936: in metadata
    return Distribution.from_name(distribution_name).metadata
/usr/lib64/python3.10/importlib/metadata/__init__.py:518: in from_name
    raise PackageNotFoundError(name)
E   importlib.metadata.PackageNotFoundError: No package metadata was found for bodhi-messages
=============================================================================== short test summary info ===============================================================================
ERROR bodhi-messages/tests/test_base.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================================== 1 error in 0.04s ===================================================================================
mattiaverga commented 6 months ago

Oh, wait... why the bodhi container has a f39 kernel but uses dnf repo from f35?.... 🙃

mattiaverga commented 6 months ago

I tried to modify the Bodhi Containerfile FROM fedora:latest to FROM quay.io/fedora/fedora:latest, however I can't manage to recreate the container... I've run ./bcd remove and ./bcd destroy multiple times, but ./bcd run doesn't recreate the container.

AdamWill commented 6 months ago

huh. I did test this on a completely fresh system with no registry customizations, and it didn't do that. I'll test it again today. On my system I have a /etc/containers/registries.conf.d/000-shortnames.conf from containers-common package which specifies "fedora" = "registry.fedoraproject.org/fedora", so that seems like it should be 'normal' on Fedora, but maybe something overrides it if you have docker installed, or something? I guess it would be safer indeed to explicitly specify quay.io or registry.fp.o, I will change that. edit: changed

to force a container regen you have to edit the ansible play that builds the container image to have "force: true" (or just go ahead and do it directly with podman commands). I didn't really provide an interface to this as it seemed unlikely to be needed in 'normal' use, but I guess I could (maybe a command to remove images, forcing them to be rebuilt on the next run).

mattiaverga commented 6 months ago

yep, podman image rm ... + FROM quay.io/fedora/fedora:latest did the job. Now it seems to work.

AdamWill commented 5 months ago

Awesome, thanks!

Now this is merged I'd like to get rid of the Ipsilon side build, so I've poked the upstream PR and if I don't hear back on it soon I'm just going to go ahead and backport it to the Fedora packages.

What do you think about removing the Vagrant env as a follow-up to this? It would clean things up quite a bit, but if you still think the Vagrant env may be useful I won't send a PR for it.

mattiaverga commented 5 months ago

I think we can remove the Vagrant dev env and just focus on maintaining the podman based, tweaking it if any issues arise.

mattiaverga commented 5 months ago

BTW, it would be nice to have containers to use local URLs instead of localhost:port, like we used with vagrant/tinystage. For example bodhi.bodhi-dev.local, ipsilon.bodhi-dev.local, etc... not sure if that's possible with podman, though.

AdamWill commented 5 months ago

so that only really worked with vagrant because it hacked up the host's /etc/hosts , which I honestly did not love, personally. It was a handy feature in a way, but it was also why you had to babysit vagrant up for the half hour it took to run so you could enter your root password occasionally so it could mess with your /etc/hosts. (this is what Vagrant calls the "hostmanager" feature).

we would have to reimplement...something...to do that, if we wanted the feature, which personally as I said I'm not a big fan of. I don't know if something like that exists in a way that's compatible with this ansible+podman approach or if we'd have to invent it. we could of course do it in the ansible plays, but then that play would have to run as root, making it interactive in the way I kinda hated with vagrant, I guess.

I think the networking would also have to work differently in that case. I don't think TCP/IP networking between the host and the containers actually works at all, the way it's currently implemented. I didn't go super deep into podman networking, just came up with something simple that worked and went with it. It was a goal for me for this to work rootless, which changes the options when it comes to networking, which is also something to keep in mind.

AdamWill commented 5 months ago

Ipsilon changes got merged, so F38 update and F39 update are pending. Once those go stable we can drop the custom Ipsilon build from this.

AdamWill commented 3 months ago

Hrmm. I've just noticed what seems like some odd behaviour with this and, I think, Podman 5.

I can access the dev server via http://localhost:6543 in Firefox, but not Epiphany or curl from a console. http://127.0.0.1:6543 works from all three. http://localhost.localdomain:6543 doesn't work from any of the three, which is a problem, because that's the URL you have to use for CORS to be happy (anything else causes things like posting comments or creating updates to be rejected by CORS).

Don't know why this is, yet. Will try and confirm that its a podman 5 thing later.