NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.07k stars 14.13k forks source link

Nextcloud installation is broken after update to latest unstable #294588

Closed arsfeld closed 4 months ago

arsfeld commented 8 months ago

Describe the bug

My Nextcloud installation is broken after I updated to the latest unstable nixos.

At first I had an error because the config.override.php file could not be found (and it failed to flock it). It was a symlink pointing to a non-existing file. I removed it, then got an error that the apps folder was not writable.

I've now removed the config.php file in hopes it would recreate it, but instead it creates an empty file and complains the config is invalid. I'm not sure what to do next.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Have a nextcloud installation pre- #280600
  2. Update to latest nixos unstable
  3. Nextcloud stops working

Expected behavior

Nextcloud to still work after updating

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.7.6, NixOS, 24.05 (Uakari), 24.05pre-git`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(arosenfeld): `""`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/store/lwyjz70qh12nq6cb7fixl85vryzxqm3c-source`

Add a :+1: reaction to issues you find important.

arsfeld commented 8 months ago

It seems the issue is that my services.nextcloud.datadir was set to a ZFS mounted directory and for some reason systemd-tmpfiles was not able to generate the correct config.override.php file.

Removing the custom datadir and copying the config.php over to /var/lib/nextcloud/config made it work again.

This kind of sucks though, my root disk is small and my Nextcloud files are big and stored in a separate ZFS pool, with this I'll either use symlinks or the S3 object store. But what's more worryingly is that this change has the potential of borking a Nextcloud installation.

@Ma27 any ideas?

Ma27 commented 8 months ago

At first I had an error because the config.override.php file could not be found (and it failed to flock it). It was a symlink pointing to a non-existing file. I removed it, then got an error that the apps folder was not writable.

Which component raised the error? I.e. nextcloud-setup? systemd-tmpfiles? And I'd like to see logs of that.

I've now removed the config.php file in hopes it would recreate it, but instead it creates an empty file and complains the config is invalid. I'm not sure what to do next.

Restore from backup. config.php isn't managed by NixOS, but by Nextcloud. Despite what the name suggests its effectively state (and yes, I'm not very happy that configuration - i.e. override.config.php- and state is so close together).

It seems the issue is that my services.nextcloud.datadir was set to a ZFS mounted directory and for some reason systemd-tmpfiles was not able to generate the correct config.override.php file.

So, Nextcloud lives in a /nextcloud directory which is its own dataset? Is it mounted at boot (i.e. in stage1) or in stage2 with a systemd service?

Removing the custom datadir and copying the config.php over to /var/lib/nextcloud/config made it work again.

I'm afraid I'm not following. Why do you need to copy to /var/lib/nextcloud/config after removing the option? That's the default?

arsfeld commented 8 months ago

Which component raised the error? I.e. nextcloud-setup? systemd-tmpfiles? And I'd like to see logs of that.

The only component I saw having issues was Nextcloud, I only later discovered it was caused by systemd-tmpfiles.

Here's the logs for systemd-tmpfiles:

Mar 01 10:34:04 storage systemd-tmpfiles[3203]: Detected unsafe path transition /mnt/data/files (owned by 8675309) → /mnt/data/files/Nextcloud (owned by nextcloud) during canonicalization of mnt/data/files/Nextcloud.
Mar 01 10:34:04 storage systemd-tmpfiles[3203]: Detected unsafe path transition /mnt/data/files (owned by 8675309) → /mnt/data/files/Nextcloud (owned by nextcloud) during canonicalization of mnt/data/files/Nextcloud.
Mar 01 10:34:04 storage systemd-tmpfiles[3203]: Detected unsafe path transition /mnt/data/files (owned by 8675309) → /mnt/data/files/Nextcloud (owned by nextcloud) during canonicalization of mnt/data/files/Nextcloud/config.

I have /mnt/data/files mounted as a ZFS dataset, my Nextcloud datadir was set to /mnt/data/files/Nextcloud

So, Nextcloud lives in a /nextcloud directory which is its own dataset? Is it mounted at boot (i.e. in stage1) or in stage2 with a systemd service?

It's mounted through:

  fileSystems."/mnt/data/files" = {
    device = "data/files";
    fsType = "zfs";
    options = ["zfsutil" "X-mount.mkdir"];
  };

I believe this is mounted through systemd at stage2?

I'm afraid I'm not following. Why do you need to copy to /var/lib/nextcloud/config after removing the option? That's the default?

Ah yeah, so I misunderstood config.php, I thought it was also automatically created. What I meant by copying is restoring it from backup as you suggested, which indeed worked.

Let me know if this helps. I have it working now without specifying a datadir and having a symlink of data instead. But I can investigate further if needed.

Ma27 commented 8 months ago

The only component I saw having issues was Nextcloud, I only later discovered it was caused by systemd-tmpfiles.

How do the permissions inside /mnt/data/files/Nextcloud look like right now? And which user is 8675309 supposed to be? I thought 65535 is the upper bound?

Seems like systemd-tmpfiles doesn't like it if you try to make such changes if the parent directory isn't owned by root: https://github.com/systemd/systemd/issues/19618

arsfeld commented 8 months ago

How do the permissions inside /mnt/data/files/Nextcloud look like right now? And which user is 8675309 supposed to be? I thought 65535 is the upper bound?

Seems like systemd-tmpfiles doesn't like it if you try to make such changes if the parent directory isn't owned by root: systemd/systemd#19618

Yeah, that was it, changing the ownership of /mnt/data/files to be owned by root worked, it was owned 8675309 before. No idea where this '8675309' user came from, I've used this dataset in other systems before, so maybe a leftover.

Is this still a bug though? Maybe there should be a warning in the docs about it?

Ma27 commented 7 months ago

Is this still a bug though? Maybe there should be a warning in the docs about it?

@arsfeld so I haven't thought much about the why so far, I was happy that we found the cause in the first place :) Which means I'm not sure if allowing that could become a problem under certain circumstances. However, if that's not the case, then we have a bug in systemd-tmpfiles I think.

Anyways, I agree about documenting that in the Nextcloud section of the NixOS manual. Are you interested in doing that? :sweat_smile: I'd happily review, merge and backport.

Algram commented 6 months ago

Just wanted to add that I had the exact same issue 2 times now. My nextcloud instance is on a mounted nfs volume and sometimes the config.php file just gets corrupted. Only restore from backup seems to help.

Ma27 commented 6 months ago

Fwiw the cofnig.php is managed by Nextcloud, so perhaps NFS being weird?

eyJhb commented 5 months ago

I ran into this issue, and for some reason it was because my config.override.php pointed to a non-exsistant store path. I couldn't see any errors in journalctl, and "just" ended up running systemd-tmpfiles --create as root.

It shouldn't be needed. and I'm unsure WHY I had to run it, as that shouldn't be the case.

Ma27 commented 5 months ago

It shouldn't be needed. and I'm unsure WHY I had to run it, as that shouldn't be the case.

Agreed.

eyJhb commented 5 months ago

Revision: b3b2b28c1daa04fe2ae47c21bb76fd226eac4ca1

Config:

{ config, lib, pkgs, ... }:

let
  pkgsImaginary = pkgs.callPackage ./pkgs/imaginary.nix {};
  imaginaryPort = 8088;
in {
  # enable redis
  services.redis.servers.nextcloud = {
    enable = true;
    unixSocketPerm = 770;
    requirePassFile = toString config.secrets.files.redisPassword.file;
  };

  services.phpfpm.pools.nextcloud.phpOptions = ''
    memory_limit = 32768M
  '';

  services.nextcloud = {
    enable = true;
    hostName = "nextcloud.super-secret-domain.you-will-never-know";
    package = pkgs.nextcloud27;

    maxUploadSize = "8G";

    # security
    # enableBrokenCiphersForSSE = false;

    # use backup dir for any data
    datadir = "/media/nextcloud";

    # generate links with https
    https = true;

    # load redis module
    caching = {
      redis = true;
      apcu = true;
    };

    # autoupdate apps + when to do it
    autoUpdateApps.enable = true;
    autoUpdateApps.startAt = "05:00:00";

    # database
    database.createLocally = true;

    config = {
      # Nextcloud PostegreSQL database configuration, recommended over using SQLite
      dbtype = "pgsql";

      # setup a administration user
      adminuser = "admin";
      adminpassFile = toString config.secrets.files.ncAdminPassword.file;
    };

    secretFile = toString config.secrets.files.ncSecretFile.file;
    extraOptions = {
      "filelocking.enabled" = true;
      "memcache.local" = "\\OC\\Memcache\\APCu";
      "memcache.distributed" = "\\OC\\Memcache\\Redis";
      "memcache.locking" = "\\OC\\Memcache\\Redis";
      redis = {
        host = config.services.redis.servers.nextcloud.unixSocket;
        port = 0;
        dbindex = 0;
        timeout = 1;
      };

      # set default region, so it will stop complaining
      "default_phone_region" = "DK";

      # previews with imaginary
      enabledPreviewProviders = [
        "OC\\Preview\\MP3"
        "OC\\Preview\\TXT"
        "OC\\Preview\\MarkDown"
        "OC\\Preview\\OpenDocument"
        "OC\\Preview\\Krita"
        "OC\\Preview\\Imaginary"
      ];
      preview_imaginary_url = "http://localhost:${toString imaginaryPort}/";
    };

    poolSettings = lib.mkOptionDefault {
      "pm.max_children" = lib.mkForce "400";
      "pm.start_servers" = lib.mkForce "100";
      "pm.min_spare_servers" = lib.mkForce "100";
      "pm.max_spare_servers" = lib.mkForce "300";
    };

    phpOptions = lib.mkOptionDefault {
      "opcache.enable" = "1";
      "opcache.save_comments" = "1";
      "opcache.revalidate_freq" = lib.mkForce "0";
      "opcache.memory_consumption" = lib.mkForce "1024";
      "opcache.interned_strings_buffer" = lib.mkForce "512";
    };
  };

  # enable SSL on nextcloud
  services.nginx.virtualHosts."${config.services.nextcloud.hostName}" = {
    forceSSL = true;
    enableACME = true;
  };

  # setup a system user to ensure static UID/GID
  users = {
    users.nextcloud = {
      uid = 1001;
      isSystemUser = true;
    };
    groups = {
      redis-nextcloud = {
        members = [ "nextcloud" ];
        gid = 991;
      };

      nextcloud = {
        gid = 997;
        members = [ "nextcloud" ];
      };
    };
  };

  # imaginary service
  systemd.services.imaginary = {
    description = "Imaginary image service";

    wantedBy = [ "multi-user.target" ];
    after = [ "network.target" ];

    restartIfChanged = true;

    serviceConfig = {
      # Group = config.users.groups.nextcloud.name;
      ExecStart = "${pkgsImaginary}/bin/imaginary -a localhost -p ${toString imaginaryPort} -enable-url-source -return-size";
      Restart = "always";
    };
  };

  # ensure that our state is kept
  environment.persistence.root.directories = [
    config.services.redis.servers.nextcloud.settings.dir
    config.services.nextcloud.home
  ];
}

I don't think there is anything in particular special about my configs, but I do use nixus for deploying my systems.

I'm very unsure how to check if systemd-tmpfiles is running correctly. I couldn't see anything in journalctl, and all the logs were purged it seemed.

al3xtjames commented 5 months ago

Does journalctl -b | grep systemd-tmpfiles show anything useful? In my case I had a similar Detected unsafe path transition message. Fixing the permissions of services.nextcloud.datadir fixed it for me.

eyJhb commented 5 months ago

It gave me nothing. No results, at all... :(

Ma27 commented 4 months ago

Fixing the permissions of services.nextcloud.datadir fixed it for me.

That's a good point though. What are the permissions fo the datadir currently?

al3xtjames commented 4 months ago

It's currently nextcloud:nextcloud. The issue for me was that the parent directory of services.nextcloud.datadir was owned by a non-root user. I stopped getting errors from systemd-tmpfiles after fixing this.

Ma27 commented 4 months ago

@eyJhb do you have the same issue perhaps? I think we've seen this before, I don't remember though if we ever documented it.

EDIT: no it isn't. Will file a patch for this.