Open rien opened 2 years ago
The system profiles directory might still contain useful information. The symlinks there have modification times that reflect the time of nixos-rebuild
installing that link. Reinstallations (nixos-rebuild switch
or nixos-rebuild boot
) don't seem to affect the modification time.
Hi, i had a similar issue (flock error). Turns out in /var/lib/nextcloud/config
there was a file called override.config.php
which pointed to an nonexisting file. removing the override.config.php resulted in nextcloud starting up again.
@makefu was it also a GC? If so, can you please check with nix why-depends
if one of your installed system profiles references the config store-path in question (I somehow doubt that the GC is at fault, but let's rule it out anyways).
Where did the broken symlink of override.config.php
point to anyways? The same store-path as it does now after removing and rerunning nextcloud-setup
? Or a different one? One theory I have is
override.config.php
nextcloud-setup
is for some reason not reexecuted -> override.config.php
is not updatedoverride.config.php
is old enough to get GCed/var/lib/nextcloud/config
, it's dead now@Ma27 unfortunately i have already removed the override.config.php
and i am unsure if it was a link to the store path or just an ordinary file. Please mind that my nextcloud installation is quite old (kept since nextcloud 20).
That's unfortunate, because right now I don't have much to reproduce the bug. To everyone involved here: if you ever stumble upon this problem, please note where override.config.php pointed to and check your nix-gc.service logs as described above, thanks! :)
I've hit this issue and can confirm that it was garbage collected:
File: /var/lib/nextcloud/config/override.config.php -> /nix/store/i02v5w7980vilqrmmhmazwjkissqkcxj-nextcloud-config.php
Jan 08 00:03:46 cloudsgate nix-gc-start[99923]: deleting '/nix/store/i02v5w7980vilqrmmhmazwjkissqkcxj-nextcloud-config.php'
Which is roughly a month after nextcloud-setup
last ran, and matches --delete-older-than 30d
from my settings:
Active: inactive (dead) since Thu 2023-12-07 17:25:40 PST; 1 month 3 days ago
Is there anything else I can grab which will help?
Can you systemctl restart nextcloud-setup
?
Also, if you do so, please tell me the target file of the symlink (and the contents the script executed by nextcloud-setup).
Sure, I was planning to do that soon to fix things. (Which it did, BTW.) The target changed:
File: /var/lib/nextcloud/config/override.config.php -> /nix/store/x7866iq7xix70afyfw50py9k81iy3h24-nextcloud-config.php
Here's the script, except I redacted the domain name on the last line, on the off chance some bots scrape it and start trying to connect or something:
#!/nix/store/q1c2flcykgr4wwg5a6h450hxbk4ch589-bash-5.2-p15/bin/bash
set -e
if [ ! -r "/run/keys/nextcloud-pgsql-root-pw" ]; then
echo "dbpassFile /run/keys/nextcloud-pgsql-root-pw is not readable by nextcloud:nextcloud! Aborting..."
exit 1
fi
if [ -z "$(</run/keys/nextcloud-pgsql-root-pw)" ]; then
echo "dbpassFile /run/keys/nextcloud-pgsql-root-pw is empty!"
exit 1
fi
if [ ! -r "/run/keys/nextcloud-admin-pw" ]; then
echo "adminpassFile /run/keys/nextcloud-admin-pw is not readable by nextcloud:nextcloud! Aborting..."
exit 1
fi
if [ -z "$(</run/keys/nextcloud-admin-pw)" ]; then
echo "adminpassFile /run/keys/nextcloud-admin-pw is empty!"
exit 1
fi
ln -sf /nix/store/ay71npxcw8gafabr4vaxsn6pkbdm5xmc-nextcloud-27.1.4/apps /var/lib/nextcloud/
# Install extra apps
ln -sfT \
/nix/store/zwjlh8fjgris2s7hlhb3zyqzaaa2wfk8-nix-apps \
/var/lib/nextcloud/nix-apps
# create nextcloud directories.
# if the directories exist already with wrong permissions, we fix that
for dir in /var/lib/nextcloud/config /var/lib/nextcloud/data /var/lib/nextcloud/store-apps /var/lib/nextcloud/nix-apps; do
if [ ! -e $dir ]; then
install -o nextcloud -g nextcloud -d $dir
elif [ $(stat -c "%G" $dir) != "nextcloud" ]; then
chgrp -R nextcloud $dir
fi
done
ln -sf /nix/store/x7866iq7xix70afyfw50py9k81iy3h24-nextcloud-config.php /var/lib/nextcloud/config/override.config.php
# Do not install if already installed
if [[ ! -e /var/lib/nextcloud/config/config.php ]]; then
export DBPASS="$(<"/run/keys/nextcloud-pgsql-root-pw")"
export ADMINPASS="$(<"/run/keys/nextcloud-admin-pw")"
/nix/store/fnmpryp3q4r6my4i6bplj01zj448fwdf-nextcloud-occ/bin/nextcloud-occ maintenance:install \
--admin-pass "$ADMINPASS" \
--admin-user "root" \
--data-dir "/var/lib/nextcloud/data" \
--database "pgsql" \
--database-host "postgresql.service.consul" \
--database-name "nextcloud" \
--database-pass "$DBPASS" \
--database-user "nextcloud"
fi
/nix/store/fnmpryp3q4r6my4i6bplj01zj448fwdf-nextcloud-occ/bin/nextcloud-occ upgrade
/nix/store/fnmpryp3q4r6my4i6bplj01zj448fwdf-nextcloud-occ/bin/nextcloud-occ config:system:delete trusted_domains
/nix/store/fnmpryp3q4r6my4i6bplj01zj448fwdf-nextcloud-occ/bin/nextcloud-occ config:system:set trusted_domains \
0 --value="-DOMAIN-"
Best guess is that it was updated but somehow didn't run with the new one?
AFAIK (and logs seem to confirm this), the last time it was run was at boot when I upgraded to NixOS 23.11 a month ago. It's possible that I deployed changes in the meantime and they didn't activate the new script. Shell history says the last push was on 2023-12-08 at 1:05:57, which was indeed after the last time setup ran.
Best guess is that it was updated but somehow didn't run with the new one?
Yes: the reference to override.config.php
in the string-context from nextcloud-setup.service
ensures that it doesn't get garbage collected. And the symlink being created by your current nextcloud-setup confirms that.
Normally, that shouldn't happen: any chance you have logs left that are old enough?
To give you a few pointers: with journalctl -t nixos
you should be able to see when which config got deployed. Then, with journalctl -t systemd
& journalctl -u nextcloud-setup
you should be able to see when/if nextcloud-setup
got invoked and whether it failed. That would be very helpful to rule out an issue with nextcloud-setup
and switch-to-configuration.pl
(the script that does all the starting/restarting/stopping of units after a nixos-rebuild switch
).
My theory is that nextcloud-setup
failed too early and thus the new config was activated, but the active system config didn't reference your override.config.php
anymore (there's quite a bunch of shell code before the symlink is created/updated).
I'm wondering if the nicer solution would be using tmpfiles. These are refreshed on each activation and on a reboot and not as part of a service that may or may not be restarted. Not sure when I'll get to it, but I'll probably file a patch soonish.
I have all of my VM's journald logs piped to a Loki instance, so even if the journal's been rotated I should be able to go back and grab anything from recent history.
But journalctl -t nixos
appears to go back a few years. Earliest entries are deploying 21.11, but the most recent entry is from November 5th, which is before I switched to 23.11. Around that timeframe I switched from using Colmena to using nixos-rebuild
with --target-host
. Is it possible that such deployments don't get logged? /nix/var/nix/system
ultimately points to a 23.11 profile that isn't mentioned in the -t nixos
logs, so the deployment definitely did work.
With the -u nextcloud-setup
logs the December 7th invocation does a whole upgrade run and ultimately says that it deactivated successfully.
Yeah, this feels like something that should be put somewhere like /run
and just generated every boot.
Is it possible that such deployments don't get logged?
Actually not: this is done in switch-to-configuration
directly.
With the -u nextcloud-setup logs the December 7th invocation does a whole upgrade run and ultimately says that it deactivated successfully.
And from when is your currently activated configuration (you should be able to find that out by checking the file ages in /nix/var/nix/profiles
for system*
).
I kinda regret that we don't have a -v
added to ln
, then it'd be easier to spot if everything went well here.
That said, I'm rather convinced that this is the only explanation that makes sense.
I've just had this happen to me again after running sudo nix-collect-garbage -d
:
Mar 27 00:09:50 rudolf systemd[1]: Starting nextcloud-setup.service...
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: An unhandled exception has been thrown:
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: TypeError: flock(): Argument #1 ($stream) must be of type resource, bool given in /nix/store/75z9bwr5zn527sj6wg6f8>
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: Stack trace:
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #0 /nix/store/75z9bwr5zn527sj6wg6f8g737k7yhlrl-nextcloud-28.0.3/lib/private/Config.php(228): flock(false, 1)
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #1 /nix/store/75z9bwr5zn527sj6wg6f8g737k7yhlrl-nextcloud-28.0.3/lib/private/Config.php(71): OC\Config->readData()
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #2 /nix/store/75z9bwr5zn527sj6wg6f8g737k7yhlrl-nextcloud-28.0.3/lib/base.php(149): OC\Config->__construct('/var/li>
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #3 /nix/store/75z9bwr5zn527sj6wg6f8g737k7yhlrl-nextcloud-28.0.3/lib/base.php(616): OC::initPaths()
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #4 /nix/store/75z9bwr5zn527sj6wg6f8g737k7yhlrl-nextcloud-28.0.3/lib/base.php(1200): OC::init()
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #5 /nix/store/75z9bwr5zn527sj6wg6f8g737k7yhlrl-nextcloud-28.0.3/console.php(48): require_once('/nix/store/75z9...')
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #6 /nix/store/75z9bwr5zn527sj6wg6f8g737k7yhlrl-nextcloud-28.0.3/occ(11): require_once('/nix/store/75z9...')
Mar 27 00:09:50 rudolf nextcloud-setup-start[692746]: #7 {main}
Mar 27 00:09:50 rudolf systemd[1]: nextcloud-setup.service: Main process exited, code=exited, status=1/FAILURE
Mar 27 00:09:50 rudolf systemd[1]: nextcloud-setup.service: Failed with result 'exit-code'.
Mar 27 00:09:50 rudolf systemd[1]: Failed to start nextcloud-setup.service.
And that despite @Ma27's PR.
The broken link is
$ ls -l /var/lib/nextcloud/config/override.config.php
lrwxrwxrwx 1 root root 64 Mar 10 20:26 /var/lib/nextcloud/config/override.config.php -> /nix/store/ny6h3i7ynkwc9q52d8wzl384qvm9mf84-nextcloud-config.php
and after changing my config slightly, rebuilding, then changing it back and rebuilding, I have
ls -l /var/lib/nextcloud/config/override.config.php
lrwxrwxrwx 1 root root 64 Mar 27 00:21 /var/lib/nextcloud/config/override.config.php -> /nix/store/2qw84fwb3iwn6ykrxk5zb3k4xbq6vj1g-nextcloud-config.php
@dotlambda which NixOS revision are you on?
@dotlambda which NixOS revision are you on?
Latest nixos-unstable.
@dotlambda does both deploying and booting up the machine trigger systemd-tmpfiles? I think it's now entirely done over systemd services (before it was in an activation script IIRC), so if that wasn't activated at some point, we may know our answer.
Perhaps this didn't happen for some reason and now we have the same issue again (the assumption my patch relies on is that tmpfiles is executed if override.config.php changes).
I have the same issue (nixos-unstable).
I'll need a little more details (see comments above).
I just got reminded that systemd-tmpfiles needs root-owned parent directories to operate correctly (https://github.com/NixOS/nixpkgs/issues/294588#issuecomment-2190190315). Is that the case for you? Otherwise it may happen that tmpfiles just skips refreshing the config file :thinking:
Describe the bug
Today, pas midnight (00:00) my nextcloud instance broke. It was giving the following error:
In the corresponding php file, it seems like the actual config file is unreadable or doesn't exist.
I tried restarting
nginx
andphp-fpm-nextcloud
, but the error persisted.Looking at the log file, a few moments before these error messages occurred, an automatic
nix-gc
happened, containing among the listing of deleted files the following log line:The crash was fixed by rebuilding the system. This seems to suggest that the config was actually still being referenced somewhere, as restarting the relevant services wasn't doing anything.
I think it could be caused by the following line:
https://github.com/NixOS/nixpkgs/blob/821a81dcc4e872bf2836ac18b12938e7de6c0f49/nixos/modules/services/web-apps/nextcloud.nix#L776
Where the
overrideConfig
is garbage collected. But that seems weird, because the current config should reference this file somehow.Steps To Reproduce
Unfortunately I could not reproduce this by performing a
nix-collect-garbage -d
manually. By rebuilding and garbage collecting my system, I think I also removed all evidence required to troubleshoot this issue unfortunately.Expected behavior
Nextcloud shouldn't stop working after a garbage collect.
Additional context
My system configuration is a NixOS flake over at rien/nixos-config#580efa35. There are multiple machines configured, the server experiencing the crash was space and it was using this custom module to configure nextcould. All links reference the commit that was currently deployed to the server.
Notify maintainers
@schneefux @bachp @globin @fpletz @ma27
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.