NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.19k stars 14.2k forks source link

gotenberg fails because LibreOffice tries to write temp file to / #349123

Open nh2 opened 1 month ago

nh2 commented 1 month ago

For paperless-ngx, I'm trying to integrate it with the newly added tika and gotenberg services to index office documents (Word, Excel, etc).

However, upon my .xls gotenborg invokes soffice which crashes with SIGABRT (C++ exception, likely).

Some logs: ``` Oct 16 16:57:04 hp2 systemd-coredump[3478697]: [🡕] Process 3478691 (soffice.bin) of user 61254 dumped core. ... #0 0x00007fb6f4f6fefc __pthread_kill_implementation (libc.so.6 + 0x8fefc) #1 0x00007fb6f4f1fe86 raise (libc.so.6 + 0x3fe86) #2 0x00007fb6f4f08935 abort (libc.so.6 + 0x28935) #3 0x00007fb6f520f44f _ZN12_GLOBAL__N_121signalHandlerFunctionEiP9siginfo_tPv (libuno_sal.so.3 + 0x4044f) #4 0x00007fb6f4f1ff30 __restore_rt (libc.so.6 + 0x3ff30) #5 0x00007fb6f4f6fefc __pthread_kill_implementation (libc.so.6 + 0x8fefc) #6 0x00007fb6f4f1fe86 raise (libc.so.6 + 0x3fe86) #7 0x00007fb6f4f08935 abort (libc.so.6 + 0x28935) #8 0x00007fb6f0e7a512 _Z8SalAbortRKN3rtl8OUStringEb.cold (libvcllo.so + 0x475512) #9 0x00007fb6f51090c7 _ZN7desktop7Desktop9ExceptionE17ExceptionCategory (libsofficeapp.so + 0x3c0c7) #10 0x00007fb6f12b8787 _ZL23VCLExceptionSignal_implPvP13oslSignalInfo (libvcllo.so + 0x8b3787) #11 0x00007fb6f51e8362 _Z17callSignalHandlerP13oslSignalInfo (libuno_sal.so.3 + 0x19362) #12 0x00007fb6f520f392 _ZN12_GLOBAL__N_121signalHandlerFunctionEiP9siginfo_tPv (libuno_sal.so.3 + 0x40392) #13 0x00007fb6f4f1ff30 __restore_rt (libc.so.6 + 0x3ff30) #14 0x00007fb6f4f6fefc __pthread_kill_implementation (libc.so.6 + 0x8fefc) #15 0x00007fb6f4f1fe86 raise (libc.so.6 + 0x3fe86) #16 0x00007fb6f4f08935 abort (libc.so.6 + 0x28935) #17 0x00007fb6f4d00c0b _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold (libstdc++.so.6 + 0xacc0b) #18 0x00007fb6f4d1021a _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xbc21a) #19 0x00007fb6f4d10285 _ZSt9terminatev (libstdc++.so.6 + 0xbc285) #20 0x00007fb6f4d104d7 __cxa_throw (libstdc++.so.6 + 0xbc4d7) #21 0x00007fb6e88d968a _ZN10dp_manager16ExtensionManager27reinstallDeployedExtensionsEhRKN3rtl8OUStringERKN3com3sun4star3uno9ReferenceINS7_4task13XAbortChannelEEERKNS9_INS7_3ucb19XCommandEnvironmentEEE.cold (libdeployment.so + 0x1f68a) #22 0x00007fb6f51198f2 _ZN7desktop7Desktop32SynchronizeExtensionRepositoriesEbPS0_ (libsofficeapp.so + 0x4c8f2) #23 0x00007fb6f510df72 _ZN7desktop7Desktop4MainEv (libsofficeapp.so + 0x40f72) #24 0x00007fb6f12b9ed6 _Z10ImplSVMainv (libvcllo.so + 0x8b4ed6) #25 0x00007fb6f5135568 soffice_main (libsofficeapp.so + 0x68568) #26 0x000000000040106b main (soffice.bin + 0x106b) #27 0x00007fb6f4f0a10e __libc_start_call_main (libc.so.6 + 0x2a10e) #28 0x00007fb6f4f0a1c9 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a1c9) #29 0x00000000004010a5 _start (soffice.bin + 0x10a5) ... gotenberg[3474684]: {"level":"error","ts":1729097824.324338,"logger":"api","msg":"convert to PDF: process first start: start process: execute LibreOffice: unix process error: wait for unix process: signal: aborted (core dumped)","trace":"46adfbc3-46fc-435f-9280-5d5d7aa8dae9","remote_ip":"::1","host":"localhost:3000","uri":"/forms/libreoffice/convert","method":"POST","path":"/forms/libreoffice/convert","referer":"","user_agent":"python-httpx/0.27.0","status":500,"latency":7156383196,"latency_human":"7.156383196s","bytes_in":1676669,"bytes_out":21} ```

Unfortunately it discards all stderr to /dev/null, as I discovered with strace:

[pid 3478691] openat(AT_FDCWD</>, "//.execooo1nQd8n", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EROFS (Read-only file system)
[pid 3478691] write(2</dev/null>, "mkstemp(\"//.execooo1nQd8n\") failed: Read-only file system\n", 58) = 58
[pid 3478691] write(2</dev/null>, "terminate called after throwing an instance of '", 48) = 48
[pid 3478691] write(2</dev/null>, "com::sun::star::deployment::DeploymentException", 47) = 47

It tries to write a temp file to //.execooo1nQd8n.

Probably somewhere some temp dir didn't get set correctly, so it tries to write to /.

CC @pyrox0 from #326372

nh2 commented 1 month ago

I used this config to hook paperless up to tika and gotenberg:

  services.paperless = {
    enable = true;
    consumptionDir = "/my-paperless-inbox";
    consumptionDirIsPublic = true;
    settings = {
      PAPERLESS_CONSUMER_RECURSIVE = true;
      PAPERLESS_OCR_LANGUAGE = "deu+eng";
      PAPERLESS_TIKA_ENABLED = "1";
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT = "http://localhost:${toString config.services.gotenberg.port}";
      PAPERLESS_TIKA_ENDPOINT = "http://${config.services.tika.listenAddress}:${toString config.services.tika.port}";
    };
  };
nh2 commented 1 month ago

I suspect that mkstemp ist given a relative path, which creates temp file in the current working directory, which in our systemd unit is /:

# ls -l /proc/$(pidof gotenberg)/cwd
lrwxrwxrwx 1 gotenberg gotenberg 0 Oct 16 20:03 /proc/3474684/cwd -> /
pyrox0 commented 1 month ago

Can you try running with the following configuration added?

systemd.services.gotenberg.serviceConfig = {
    WorkingDirectory = "/run/gotenberg";
    RuntimeDirectory = "gotenberg";
};
nh2 commented 1 month ago

@pyrox0 Hm, it still doesn't work.

Your change is effective:

# ls -lah /proc/$(pidof gotenberg)/cwd
lrwxrwxrwx 1 gotenberg gotenberg 0 Oct 17 01:14 /proc/3524445/cwd -> /run/gotenberg

but the error is now:

openat(AT_FDCWD</run/gotenberg>, "//.execoooVobdsR", O_RDWR|O_CREAT|O_EXCL, 0600) = -1
 EROFS (Read-only file system)

So your /run/gotenberg made it into the right place, but LibreOffice apparently explicitly starts the template in int mkstemp(char *template) with / (even //, not sure what that's about), so it's really given absolute path here (and not relative as I suspected).

I think this is the code:

https://github.com/LibreOffice/core/blob/5cf912b08e4a22f66ab1ec5fa601ba3e50e3c4cc/bridges/source/cpp_uno/shared/vtablefactory.cxx#L264-L278

    if (aSecurity.getHomeDir(strURLDirectory))
        osl::File::getSystemPathFromFileURL(strURLDirectory, strDirectory);

    for (int i = strDirectory.isEmpty() ? 1 : 0; i < 2; ++i)
    {
        if (strDirectory.isEmpty())
            strDirectory = "/tmp";

        strDirectory += "/.execoooXXXXXX";
        OString aTmpName = OUStringToOString(strDirectory, osl_getThreadTextEncoding());
        std::unique_ptr<char[]> tmpfname(new char[aTmpName.getLength()+1]);
        strncpy(tmpfname.get(), aTmpName.getStr(), aTmpName.getLength()+1);
        // coverity[secure_temp] - https://communities.coverity.com/thread/3179
        if ((block.fd = mkstemp(tmpfname.get())) == -1)
            fprintf(stderr, "mkstemp(\"%s\") failed: %s\n", tmpfname.get(), strerror(errno));

So apparently strDirectory is /.

Then strDirectory += "/.execoooXXXXXX"; makes it //.execoooXXXXXX.

pyrox0 commented 1 month ago

hmmm, maybe libreoffice needs a set home directory as well?(thinking about that based on the code above)

try adding the following to the serviceConfig block:

Environment = {
    HOME = "/run/gotenberg";
};
nh2 commented 1 month ago

That code seems to eventually check the HOME env var here:

https://github.com/LibreOffice/core/blob/5cf912b08e4a22f66ab1ec5fa601ba3e50e3c4cc/sal/osl/unx/security.cxx#L354

Adding it with

  systemd.services.gotenberg.environment = {
    HOME = "/run/gotenberg";
  };

Next failure:

audit: type=1326 audit(1729129059.227:5): auid=4294967295 uid=61254 gid=61254 ses=4294967295 subj=kernel pid=3528423 comm="configmgrWriter" exe="/nix/store/a4yq73486ijfh4k1ci2bfds5rp4h7hkz-libreoffice-7.6.7.2/lib/libreoffice/program/soffice.bin" sig=31 arch=c000003e syscall=92 compat=0 ip=0x7fb66720c15b code=0x80000000

Syscall 92 is chown.

Interestingly, in strace it shows it as

<... chown resumed>)            = 92
+++ killed by SIGSYS (core dumped) +++

where the return value is the system call number 92; so this way one can find such cases a bit faster (search for SIGSYS, check if just above for the same thread, a syscall returned its syscall number).

Adding

systemd.services.gotenberg.serviceConfig.SystemCallFilter = lib.mkAfter ["@chown"]; 

seems to fix that.

But we continue with more errors -- see follow-up post.

Not sure SystemCallFilter is good

I'm not convinced the whole systemd SystemCallFilter is a good idea.

We're just guessing around which system calls the program might make.

But we don't know that; we're not the authors of the program, and even those can't say for sure given libraries below it, such as libc, and whatever plugins gotenberg can use, may change over time.

Then further, @system-service is documented as

A reasonable set of system calls

That's vague, and there isn't even a link to what calls are included in that today. Who knows if chown is in or not?

It feels like this will inevitably break, even if we get it to work for my Excel file today.

nh2 commented 1 month ago

@pyrox0 BTW the best would of course be if we could add this whole integration to the NixOS VM test, and assert there that it correctly processes a simple .doc and .xls file each.

pyrox0 commented 1 month ago

@pyrox0 BTW the best would of course be if we could add this whole integration to the NixOS VM test, and assert there that it correctly processes a simple .doc and .xls file each.

Agreed, though I'm not sure the best way to do so. We'd need to create test files or something, and they would need to be added to nixpkgs(unless there's a libreoffice test data derivation made that fetches from a git repo or something similar), and I'm hesitant to do so.

nh2 commented 1 month ago

Next error I'm facing:

Error occurred while consuming document test.xls: Error while converting document to PDF: Server error '503 Service Unavailable' for url 'http://localhost:3000/forms/libreoffice/convert'

gotenberg journalctl logs say:

convert to PDF: supervisor run task: context deadline exceeded

In this case in the strace the soffice didn't even appear any more, oddly. The only execve() is for unoconv.

Checking now if increasing --api-timeout fixes it.

nh2 commented 1 month ago

We'd need to create test files or something, and they would need to be added to nixpkgs

@pyrox0 That should be no problem at all. We just add an XLS file containing hello world and check it in.

pyrox0 commented 1 month ago

That code seems to eventually check the HOME env var here:

LibreOffice/core@5cf912b/sal/osl/unx/security.cxx#L354

Adding it with

  systemd.services.gotenberg.environment = {
    HOME = "/run/gotenberg";
  };

Next failure:

audit: type=1326 audit(1729129059.227:5): auid=4294967295 uid=61254 gid=61254 ses=4294967295 subj=kernel pid=3528423 comm="configmgrWriter" exe="/nix/store/a4yq73486ijfh4k1ci2bfds5rp4h7hkz-libreoffice-7.6.7.2/lib/libreoffice/program/soffice.bin" sig=31 arch=c000003e syscall=92 compat=0 ip=0x7fb66720c15b code=0x80000000

Syscall 92 is chown.

Interestingly, in strace it shows it as

<... chown resumed>)            = 92
+++ killed by SIGSYS (core dumped) +++

where the return value is the system call number 92; so this way one can find such cases a bit faster (search for SIGSYS, check if just above for the same thread, a syscall returned its syscall number).

Adding

systemd.services.gotenberg.serviceConfig.SystemCallFilter = lib.mkAfter ["@chown"]; 

seems to fix that.

But we continue with more errors -- see follow-up post.

Not sure SystemCallFilter is good

I'm not convinced the whole systemd SystemCallFilter is a good idea.

We're just guessing around which system calls the program might make.

But we don't know that; we're not the authors of the program, and even those can't say for sure given libraries below it, such as libc, and whatever plugins gotenberg can use, may change over time.

Then further, @system-service is documented as

A reasonable set of system calls

That's vague, and there isn't even a link to what calls are included in that today. Who knows if chown is in or not?

It feels like this will inevitably break, even if we get it to work for my Excel file today.

Does removing SystemCallFilter from the service fix(or introduce) any errors?

pyrox0 commented 1 month ago

We'd need to create test files or something, and they would need to be added to nixpkgs

@pyrox0 That should be no problem at all. We just add an XLS file containing hello world and check it in.

I'm aware of what we could do, my issue is with the constantly bloating size of the nixpkgs tarball. Anything that we add, no matter if it's used or not, is downloaded by every consumer of nixpkgs. Therefore putting testing data inside nixpkgs is not something I see as a good thing.

nh2 commented 1 month ago

Anything that we add, no matter if it's used or not, is downloaded by every consumer of nixpkgs.

I agree on the general idea of keeping downloads small, but the size is negligible versus the positive impact of automatic testing.

A gzip'd empty .xls is 1 KB. That's about as much as the systemd hardening flags in this module, and 4x less than the module's option definitions.

The time saved across users from automatic testing has a huge impact in turn.

That said, we can also fetchurl it if we prefer that, e.g. sticking it here as an attachment into a Github comment.

nh2 commented 1 month ago

Does removing SystemCallFilter from the service fix(or introduce) any errors?

Yes, adding the @chown fixes the error, and allows to progress to the next one from https://github.com/NixOS/nixpkgs/issues/349123#issuecomment-2418341452 (removing SystemCallFilter would have at least the same effect).

Of course this type of lockdown can also have some benefits, but it's unfortunately always a tradeoff as opposed to a straight win of more "informed" approaches (e.g. an app landlock or seccomp'ing itself).

nh2 commented 1 month ago

Checking now if increasing --api-timeout fixes it.

@pyrox0 I got it to work now with --api-timeout=300s. The server I have this on is bit slow.

My full config that works:

  services.paperless = {
    enable = true;
    consumptionDir = "/heimserver/paperless-inbox";
    consumptionDirIsPublic = true;
    settings = {
      PAPERLESS_CONSUMER_RECURSIVE = true;
      PAPERLESS_OCR_LANGUAGE = "deu+eng";
      PAPERLESS_TIKA_ENABLED = "1";
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT = "http://localhost:${toString config.services.gotenberg.port}";
      PAPERLESS_TIKA_ENDPOINT = "http://${config.services.tika.listenAddress}:${toString config.services.tika.port}";
    };
  };
  services.tika = {
    enable = true;
    package = unstable.tika; # TODO: Remove with NixOS >= 24.11
  };
  services.gotenberg = {
    enable = true;
    package = unstable.gotenberg; # TODO: Remove with NixOS >= 24.11
    timeout = 300;
  };
  systemd.services.gotenberg.environment = {
    HOME = "/run/gotenberg";
  };
  systemd.services.gotenberg.serviceConfig = {
     SystemCallFilter = lib.mkAfter ["@chown"]; # TODO remove when fixed
     WorkingDirectory = "/run/gotenberg";
     RuntimeDirectory = "gotenberg";
  };
nh2 commented 1 month ago

Another thing that's currently bad with the server:

systemctl stop gotenberg.service has an unnecessary delay, even when the queue is empty; this makes NixOS operations slow:

Oct 17 03:06:22 hp2 systemd[1]: Stopping Gotenberg API server...
Oct 17 03:06:22 hp2 gotenberg[3541371]: [SYSTEM] graceful shutdown of 30s
Oct 17 03:06:22 hp2 gotenberg[3541371]: [SYSTEM] prometheus: application stopped
Oct 17 03:06:22 hp2 gotenberg[3541371]: [SYSTEM] api: application stopped
  -- 30 seconds sleep here
Oct 17 03:06:52 hp2 gotenberg[3541371]: [SYSTEM] chromium: application stopped
Oct 17 03:06:52 hp2 gotenberg[3541371]: [SYSTEM] libreoffice-api: application stopped
Oct 17 03:06:52 hp2 systemd[1]: gotenberg.service: Deactivated successfully.

That sleep seems to apply only to the chromium and libreoffice-api parts.

We should try to make it so that it stops without timeout if the queue is empty.

Edit: Filed as https://github.com/gotenberg/gotenberg/issues/1022

pyrox0 commented 1 month ago

I'm not sure how we would check the queue is empty, I think that would require a custom program or script of some sort. You could alternatively set --gotenberg-graceful-shutdown-duration to something less than 30 seconds in your configuration using extraArgs(once the PR to actually enable it goes through)

nh2 commented 1 month ago

@pyrox0 I suspect that --gotenberg-graceful-shutdown-duration is simply not working as advertised. I'll file an upstream issue to clarify.

nh2 commented 1 month ago

I also filed this feature request: