backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

PR sandboxes are broken (base image keeps getting into failed status a few hours after having been rebuilt) #6139

Closed klonos closed 1 year ago

klonos commented 1 year ago

I'm noticing that some (all?) PR sandboxes are failing to be created. The error shown is this:

2023-06-08T12:49:11.935Z - Rebuilding preview: 6481cb0ba18a5084bb6cc78b (pr4448)
2023-06-08T12:49:12.274Z - Cannot rebuild preview (Tugboat Error 1046): Base preview is failed

When I check the base preview, it seems to be working properly (the home page loads as expected etc.), however, there are a couple of warnings in the build log:

...
2023-06-08T00:03:59.141Z - 60815f2029bd31f02c3bc524# /bin/sh -c cd $TUGBOAT_ROOT/backdrop && ../.tugboat/auto-login.sh

Deprecated: explode(): Passing null to parameter #2 ($string) of type string is deprecated in /var/lib/tugboat/backdrop/core/includes/bootstrap.inc on line 981

Warning: session_name(): Session name cannot be changed after headers have already been sent in /var/lib/tugboat/backdrop/core/includes/bootstrap.inc on line 994
Login URL: /user/reset/1/1686182639/vTb1SZukVZDAkC9ZrYjErjj7TSo22z3jMe_FE83HkdE/login
2023-06-08T00:03:59.365Z - Committing 60815f1e7ceac0f234c0071e (latest)
2023-06-08T00:04:07.655Z - 60815f1e7ceac0f234c0071e (latest) is ready
klonos commented 1 year ago

@quicksketch should I try rebuilding the image, as suggested by the build log? As it is currently, the PR sandboxes aren't being provisioned anyway.

klonos commented 1 year ago

It seems that because the base image had failed to build, it caused a ripple effect and we had reached our quota. I have now rebuilt the base image, which finished w/o errors and is green: image

I have queued the PR sandboxes to be also rebuilt, which seems to be reducing their size from 1.6-17GB to less than half the size (screenshot shows 3 sandboxes with the newly-built image at the top, followed by another 3 still in the queue to be rebuilt): image

Once all PR sandboxes have been rebuilt, I expect us to be at a healthy usage, but I'll check to make sure (it might take some time).

klonos commented 1 year ago

...all recent PR sandboxes have been rebuilt, and disk usage is looking good now: image

klonos commented 1 year ago

I need to take a break, but seeing this in one of the test PRs:

2023-06-09T09:09:30.878Z - Building preview: 6482ec4a4c72b346f23be010 (pr4448)
2023-06-09T09:09:41.736Z - 6482ec4fa18a5084bb977901# git -C /var/lib/tugboat stash
fatal: detected dubious ownership in repository at '/var/lib/tugboat'
To add an exception for this directory, call:

        git config --global --add safe.directory /var/lib/tugboat
2023-06-09T09:09:42.292Z - Command Failed (Tugboat Error 1064): Exit code: 128; Command: stash
klonos commented 1 year ago

After @indigoxela mentioned in the Zulip chat that the sandbox for her PR is not working, I checked again, and the base image seems to have gotten itself into a broken state again:

image

I won't rebuild it this time, in order to give @quicksketch and the Tugboat engineers a chance to have a look ...and also so that we don't keep spamming people with GitHub notification each time we rebuild the sandboxes (which we need to do each time we rebuild the image, otherwise we'll run out of disc space, and then the PR sandboxes will be failing for that reason this time).

indigoxela commented 1 year ago

@klonos many thanks for opening this issue. :pray: I gave it another try today, to verify, the demo sandboxes still don't work. They don't.

indigoxela commented 1 year ago

fatal: detected dubious ownership in repository at '/var/lib/tugboat'

Ha! That's the problem! It seems, they now use a newer version of git, which refuses to work because of file ownership. Probably easy to fix, but that has to happen on their side.

klonos commented 1 year ago

Hey @quicksketch can you please ping people in the Tugboat Support team and see what needs to be done about this? ^^

Thanks.

indigoxela commented 1 year ago

@klonos it might be, that this also affects their paying customers and we need to be a little patient - which is absolutely OK.

klonos commented 1 year ago

In the meantime, I'm wondering if we could work around this problem by (even temporarily) using the 8.1-apache-buster image instead of the 8.1-apache-bullseye that we have decided to go with in #6133 🤔 ...the assumption being that that image won't have the newer version of git where that issue was introduced.

klonos commented 1 year ago

...or perhaps even merge #5927, where we should be switching to the 8.2-apache image (the one that doesn't specify the OS version, and with which we didn't have any issues).

klonos commented 1 year ago

...or perhaps even merge #5927, where we should be switching to the 8.2-apache image

Nope, that didn't work. I have rebased my fork of backdrop/backdrop, then merged the PR from #5927 (which uses 8.2-apache-bullseye), and tried to create a tugboat sandbox off of my own repo. The creation of the base image failed again (side note: the fact that the mariadb instance is showing as using 25GB is very odd):

image

I then switched it to plain 8.2-apache, and that didn't work either 👎🏼

I then switched it to 8.2-apache-buster, and that didn't work either 👎🏼

So I guess we do need to wait till things are fixed on Tugboat's end 🤷🏼

indigoxela commented 1 year ago

... where we should be switching to the 8.2-apache image (the one that doesn't specify the OS version

@klonos tried that, too, yesterday, didn't work either. :shrug:

As long as we're not able to either fix the file ownership under '/var/lib/tugboat', or add the safe.directory setting for that dir, we'll have to wait, until it's fixed on their end.

klonos commented 1 year ago

As long as we're not able to either fix the file ownership under '/var/lib/tugboat', /...

Yeah, I was hoping that this had something to do with the directories within the image, and that downgrading to an earlier version of the OS would also downgrade the version of git back to one that didn't have this problem. I was wrong obviously 😞

quicksketch commented 1 year ago

I think the file ownership problem might be self-inflicted by our .tugboat/config.yml configuration. We have this section in our update hook:

      update:
        # Set appropriate file permissions/ownership.
        - chown -R www-data:www-data $TUGBOAT_ROOT
        - find $TUGBOAT_ROOT/files $TUGBOAT_ROOT/layouts $TUGBOAT_ROOT/modules $TUGBOAT_ROOT/themes -type d -exec chmod 2775 {} \;
        - find $TUGBOAT_ROOT/files $TUGBOAT_ROOT/layouts $TUGBOAT_ROOT/modules $TUGBOAT_ROOT/themes -type f -exec chmod 0664 {} \;\

That means that we're setting the entire $TUGBOAT_ROOT (which is the problematic '/var/lib/tugboat' directory) to be owned by www-data, including the nested .git directory. When the tugboat user is running git commands, it detects that www-data owns the .git directory instead of itself and it throws the error. I think if we target the chown command more narrowly or exclude the .git directory from being affected, we will avoid the error in these newer versions of git.

quicksketch commented 1 year ago

With the base preview failing I'll need to merge that PR to fully test it, it seems like a benign switching of syntax so I'm going to go ahead and give it a shot.

quicksketch commented 1 year ago

Compounding our issues, Tugboat's mariadb:latest image bumped from MariaDB 10 to 11 in this commit 3 days ago: https://github.com/TugboatQA/dockerfiles/commit/9e11822fb43f0a92bb5abe2e4b7597fd8fd70d19#diff-1f29ef5c43985fdd5f18d159a2a3e81f6492ec961a0ff430a7d78d2d40a522eb

I'm not sure why the MariaDB image isn't building at all (the service doesn't even come up, it doesn't seem to be a problem we're causing). We can try switching to mariadb:lts to switch back to MariaDB 10.

quicksketch commented 1 year ago

Tugboat has fixed the issue with mariadb:latest over in https://github.com/TugboatQA/images/issues/109.

The base preview is now building properly but PR previews are giving the notice mysqladmin command not found. I think if we switch to mariadb-admin, that might do the trick.

quicksketch commented 1 year ago

Existing PRs still don't have working sandboxes, but I believe that new PRs should now get working sandboxes again.

We could further downgrade the MariaDB image to get the existing sandboxes building again. Or we could require them to be rebased or merge in 1.x to get up-to-date. Which would be a better approach?

klonos commented 1 year ago

I think that rebasing is always a good idea (we have found that PHPCS complains less when you rebase too).

indigoxela commented 1 year ago

I can confirm that the PR sandbox build succeeds again with https://github.com/backdrop/backdrop/pull/4445 :+1: