Netatalk / netatalk

Netatalk is a Free and Open Source AFP fileserver. A *NIX or BSD system running Netatalk is capable of serving many Macintosh clients simultaneously as an AppleShare file server.
https://netatalk.io
GNU General Public License v2.0
327 stars 84 forks source link

[2.4] meson: crash in libatalk with papd and cupsautoadd #1061

Closed rdmark closed 3 months ago

rdmark commented 3 months ago

If you've configured a CUPS printer, configured papd with cupsautoadd, then papd will crash while starting up with the following message. This happens when the software is compiled with Meson, but not when it's compiled with Autotools.

$ doas papd -d
papd: Set syslog logging to level: LOG_DEBUG
papd: restart (2.4.0)
papd: CUPS support enabled (2.4)
papd: Locale charset 'UTF-8' unsupported, using ASCII instead
papd: Required conversion from us-ascii to UCS-2 not supported
papd: ===============================================================
papd: INTERNAL ERROR: Signal 11 in pid 6335 (2.4.0)
papd: ===============================================================
Aborted
rdmark commented 3 months ago

My immediate thought is the musl implementation of iconv...

stack looks like

#0  0x00007ffff7efc99d in convert_string_internal (from=4294967295, to=CH_UCS2, src=0x7ffff7b93244, srclen=19, 
    dest=0x7fffffffa150, destlen=8192) at ../libatalk/unicode/charcnv.c:329
#1  0x00007ffff7efd1de in convert_string_allocate (from=4294967295, to=CH_UNIX, src=0x7ffff7b93244, 
    srclen=18446744073709551615, dest=0x7ffff732b0a8) at ../libatalk/unicode/charcnv.c:509
#2  0x0000555555561cdb in cups_autoadd_printers (defprinter=0x7ffff7ffe700, printers=0x0)
    at ../etc/papd/print_cups.c:594
#3  0x000055555555f6f6 in getprinters (cf=0x555555567530 "/usr/local/etc/netatalk/papd.conf")
    at ../etc/papd/main.c:770
#4  0x000055555555dd44 in main (ac=2, av=0x7fffffffece8) at ../etc/papd/main.c:270
NJRoadfan commented 3 months ago

I have papd running fine here in Alpine. Checking with 2.4 HEAD again just in case.

NJRoadfan commented 3 months ago

Definitely a regression somewhere. Getting this with the latest 2.4 branch code. No crash, just locks up.

papd: Set syslog logging to level: LOG_DEBUG
papd: restart (2.4.0)
papd: CUPS support enabled (2.4)
rdmark commented 3 months ago

@NJRoadfan Do you get the crash if you start it in debug mode papd -d?

NJRoadfan commented 3 months ago

No crash, the output above is from papd -d.

The last branch I compiled on Alpine that papd is definitely working is netatalk-dgsga-v2-stdint

output from papd -d from that revision below

papd: Set syslog logging to level: LOG_DEBUG
papd: restart (2.4.0dev)
papd: Locale charset 'UTF-8' unsupported, using ASCII instead
papd: Authentication disabled: CUPSPDF
papd: register CUPSPDF:LaserWriter@*
rdmark commented 3 months ago

Thanks for doing the commit sleuthing. So some of those data type refreshes weren’t as safe as we thought.

rdmark commented 3 months ago

Actually, I'm getting different results after further testing. To me, this looks like a Autotools/Meson problem. If I build the HEAD of branch-netatalk-2-4 with Autotools, then papd works fine. When I build it with Meson, it crashes.

@NJRoadfan I couldn't note a difference in behavior before and after the change you pointed out. So maybe your bug is something else entirely?

rdmark commented 3 months ago

Same can be reproduced on Debian 12 when building with Meson:

$ sudo papd -d
papd: Set syslog logging to level: LOG_DEBUG
papd: restart (2.4.0)
papd: CUPS support enabled (2.4)
papd: Locale charset 'UTF-8' unsupported, using ASCII instead
papd: Required conversion from utf-8 to UCS-2 not supported
papd: ===============================================================
papd: INTERNAL ERROR: Signal 11 in pid 51682 (2.4.0)
papd: ===============================================================
papd: BACKTRACE: 12 stack frames:
papd:  #0 /usr/local/lib/x86_64-linux-gnu/libatalk.so.18(netatalk_panic+0x26) [0x7f673efced5c]
papd:  #1 /usr/local/lib/x86_64-linux-gnu/libatalk.so.18(+0x8cf54) [0x7f673efcef54]
papd:  #2 /usr/local/lib/x86_64-linux-gnu/libatalk.so.18(+0x8cfaa) [0x7f673efcefaa]
papd:  #3 /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7f673ece9050]
papd:  #4 /usr/local/lib/x86_64-linux-gnu/libatalk.so.18(+0x8690d) [0x7f673efc890d]
papd:  #5 /usr/local/lib/x86_64-linux-gnu/libatalk.so.18(convert_string_allocate+0x6b) [0x7f673efc90a3]
papd:  #6 papd(cups_autoadd_printers+0xf2) [0x55e9c49851ed]
papd:  #7 papd(+0xaf9d) [0x55e9c4982f9d]
papd:  #8 papd(main+0x3de) [0x55e9c4981816]
papd:  #9 /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f673ecd424a]
papd:  #10 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f673ecd4305]
papd:  #11 papd(_start+0x21) [0x55e9c497d871]
Aborted

cupsautoadd needs to be enabled in papd.conf: cupsautoadd:op=root:

NJRoadfan commented 3 months ago

Hopefully I can take a deeper look at this later in the week. FWIW, all my build testing in Alpine Linux has been with Meson, not Autotools. The only major changes affecting papd would be in #1009 . Comparing the working build I have here, #925 and #978 could also be the source of the problem.

ghost commented 3 months ago

@NJRoadfan, thanks for your help with this one :) Just so it helps you, #978 was pushed to ensure that all the executables are linked correctly to libatalk on install. Meson strips the link on install unless install_rpath: is specified. The build_rpath: kwarg is specified so executables can be tested in the build directory without actually installing them. #925 is purely cosmetic to improve readability of the meson.build files

ghost commented 3 months ago

Do either of you know how to set up a cups printer in a VM? I have an HP LaserJet ethernet connected to my Mac but all my testing is done on VM's residing on a hypervisor which is networked to the Mac.

NJRoadfan commented 3 months ago

If you have an AirPrint compatible printer on your network, CUPS should automatically find it and papd should as well. For manual queue setup, use the CUPS web interface. Enable it with:

Listen localhost:631
Listen /run/cups/cups.sock
WebInterface yes

in: /etc/cups/cupsd.conf It is accessible via port 631 on the machine, you'll likely need to modify firewall rules. For test printing, I usually install the CUPS-PDF printer, which should create a print queue.

NJRoadfan commented 3 months ago

I am reproducing the crash and its only happening with meson builds (with and without WolfSSL), autotools builds works fine. A quick bisect back to 18de0edc952fd125590116520a5554aba5aaace1 shows the problem still exists at that point.

NJRoadfan commented 3 months ago

This is the commit is breaking papd 641948eb71e8f6e1c42066208d7ec1ade03566ab

NJRoadfan commented 3 months ago

This is the breaking change: https://github.com/Netatalk/netatalk/commit/641948eb71e8f6e1c42066208d7ec1ade03566ab#diff-30d8f6be6320feeacf686be94f48c70869b52630e01ea625f0f15adc0d57c3e4

Reverting this one line allows papd to function on 2.4 HEAD.

NJRoadfan commented 3 months ago

Works fine in Alpine, but is crashing in Debian 10 still...... ugh.

rdmark commented 3 months ago

This is the breaking change: 641948e#diff-30d8f6be6320feeacf686be94f48c70869b52630e01ea625f0f15adc0d57c3e4

Reverting this one line allows papd to function on 2.4 HEAD.

What that line does is to put the config files in a netatalk subdir to the sysconfdir. So f.e. /usr/local/etc/netatalk instead of /usr/local/etc ... so my guess is that when building with that line reverted, your papd simply read from a "fresh" papd.conf that doesn't have the cupsautoadd: configuration, hence not triggering the bug.

This is what I observed when following in your footsteps, at any rate. :)

NJRoadfan commented 3 months ago

I'm finding that out right now. This has been broken for quite a while now, just haven't noticed since I haven't been building with meson.

Looking at the stack trace, it's likely something to do with libatalk and code page conversion. The CUPS code in papd calls via convert_to_mac_name() it to change a printer's name to be "Macintosh safe".

NJRoadfan commented 3 months ago

I'm also having issues on Debian with a2boot and timelord not loading, both with the following error message: a2boot: error while loading shared libraries: libatalk.so.18: cannot open shared object file: No such file or directory

NJRoadfan commented 3 months ago

Something is definately wrong with paths, at least on my Debian install:

papd: Set syslog logging to level: LOG_DEBUG
papd: restart (2.4.0dev)
papd: CUPS support enabled (2.3)
papd: Locale charset 'UTF-8' unsupported, using ASCII instead
papd: Required conversion from utf-8 to UCS-2 not supported
papd: ===============================================================
papd: INTERNAL ERROR: Signal 11 in pid 22178 (2.4.0dev)
papd: ===============================================================
papd: BACKTRACE: 11 stack frames:
papd:  #0 /usr/local/sbin/../lib/x86_64-linux-gnu/libatalk.so.18(netatalk_panic+0x26) [0x7ff2cb210c35]
papd:  #1 /usr/local/sbin/../lib/x86_64-linux-gnu/libatalk.so.18(+0x2be1b) [0x7ff2cb210e1b]
papd:  #2 /usr/local/sbin/../lib/x86_64-linux-gnu/libatalk.so.18(+0x2be71) [0x7ff2cb210e71]
papd:  #3 /lib/x86_64-linux-gnu/libc.so.6(+0x38d60) [0x7ff2cafadd60]
papd:  #4 /usr/local/sbin/../lib/x86_64-linux-gnu/libatalk.so.18(+0x2583d) [0x7ff2cb20a83d]
papd:  #5 /usr/local/sbin/../lib/x86_64-linux-gnu/libatalk.so.18(convert_string_allocate+0x6b) [0x7ff2cb20afc0]
papd:  #6 papd(cups_autoadd_printers+0xee) [0x556fea558f75]
papd:  #7 papd(+0xae1f) [0x556fea556e1f]
papd:  #8 papd(main+0x3c0) [0x556fea555726]
papd:  #9 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7ff2caf98d0a]
papd:  #10 papd(_start+0x2a) [0x556fea55187a]
Aborted

pretty sure /usr/local/sbin/../lib/x86_64-linux-gnu/ isn't a valid path!

rdmark commented 3 months ago

It's likely that this bug was always there ever since introducing Meson to 2.x.

I'm also having issues on Debian with a2boot and timelord not loading, both with the following error message: a2boot: error while loading shared libraries: libatalk.so.18: cannot open shared object file: No such file or directory

That's strange. Those two start up fine on my Debian 12 system...

pretty sure /usr/local/sbin/../lib/x86_64-linux-gnu/ isn't a valid path!

It is a bit messy but perfectly valid in *nix. :)

The sbin/../ section simply causes a traversal back to the starting point.

$ stat /usr/local/sbin/../lib/x86_64-linux-gnu/libatalk.so.18
  File: /usr/local/sbin/../lib/x86_64-linux-gnu/libatalk.so.18
  Size: 2036240     Blocks: 3984       IO Block: 4096   regular file
Device: 8,1 Inode: 1179683     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-06-04 08:36:40.218631592 +0900
Modify: 2024-06-04 08:34:07.955302983 +0900
Change: 2024-06-04 08:34:16.535264384 +0900
 Birth: 2024-06-04 08:34:16.535264384 +0900
NJRoadfan commented 3 months ago

It certainly isn't good form to have /../ in a path though.

rdmark commented 3 months ago

It certainly isn't good form to have /../ in a path though.

What does your meson setup command look like?

if you compare to my call stack from Debian, the library paths don’t have that traversal. So something environmental is different…

NJRoadfan commented 3 months ago

Its: meson setup build -Denable-pgp-uam=disabled -Denable-systemd=true

Debian Bullseye.

NJRoadfan commented 3 months ago

OK, noticing a divergence between how libatalk is compiled with autotools vs. meson.

Autotools builds and stores the following to /usr/local/lib:

libatalk.a
libatalk.la

UAMs are in netatalk subdirectory and have matching .la files.

Meson builds and stores the following to /usr/local/lib/x86_64-linux-gnu

libatalk.a
libatalk.so
libatalk.so.18

UAMs are in netatalk subdirectory without .la files.

ghost commented 3 months ago

Meson does not produce .la files as it does not use libtool. I don't have the traversal either... 🤔

ghost commented 3 months ago

It's likely that this bug was always there ever since introducing Meson to 2.x.

I'm also having issues on Debian with a2boot and timelord not loading, both with the following error message: a2boot: error while loading shared libraries: libatalk.so.18: cannot open shared object file: No such file or directory

That's strange. Those two start up fine on my Debian 12 system...

UPDATE: a2boot and timelord linking now fixed in the branch I've attached to this issue. The usual suspects (install and build rpaths missing)

ldd /usr/local/sbin/a2boot
        linux-vdso.so.1 (0x00007ffc797ec000)
        libatalk.so.18 => /usr/local/lib/x86_64-linux-gnu/libatalk.so.18 (0x00007f4174766000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f417456c000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f417448d000)
        libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007f4174482000)
        libldap-2.5.so.0 => /lib/x86_64-linux-gnu/libldap-2.5.so.0 (0x00007f4174423000)
        libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007f4174415000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4174855000)
        liblber-2.5.so.0 => /lib/x86_64-linux-gnu/liblber-2.5.so.0 (0x00007f4174405000)
        libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f41743e8000)
        libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f4174000000)
        libnsl.so.2 => /lib/x86_64-linux-gnu/libnsl.so.2 (0x00007f41743cd000)
        libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f4174297000)
        libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f4174266000)
        libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f4173e4a000)
        libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f4174251000)
        libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x00007f4173dfc000)
        libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x00007f4173db3000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f4173d32000)
        libtirpc.so.3 => /lib/x86_64-linux-gnu/libtirpc.so.3 (0x00007f4174221000)
        libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f4173d26000)
        libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f4173cd4000)
        libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f4173bfa000)
        libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f4173bcd000)
        libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f4173bc7000)
        libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f4173bb9000)
        libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f4173bb2000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f4173ba1000)
ghost commented 3 months ago

Running ldd /usr/local/sbin/papd installed from current HEAD on my Debian Bookworm VM gives:

        linux-vdso.so.1 (0x00007fff415c1000)
        libatalk.so.18 => /usr/local/lib/x86_64-linux-gnu/libatalk.so.18 (0x00007f26f7784000)
        libcups.so.2 => /lib/x86_64-linux-gnu/libcups.so.2 (0x00007f26f76ce000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f26f74ed000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f26f740e000)
        libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007f26f7403000)
        libldap-2.5.so.0 => /lib/x86_64-linux-gnu/libldap-2.5.so.0 (0x00007f26f73a2000)
        libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007f26f7396000)
        libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f26f7344000)
        libavahi-common.so.3 => /lib/x86_64-linux-gnu/libavahi-common.so.3 (0x00007f26f7336000)
        libavahi-client.so.3 => /lib/x86_64-linux-gnu/libavahi-client.so.3 (0x00007f26f7323000)
        libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f26f7000000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f26f7302000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f26f7896000)
        liblber-2.5.so.0 => /lib/x86_64-linux-gnu/liblber-2.5.so.0 (0x00007f26f72f2000)
        libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f26f72d5000)
        libnsl.so.2 => /lib/x86_64-linux-gnu/libnsl.so.2 (0x00007f26f72ba000)
        libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f26f6f26000)
        libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f26f728b000)
        libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f26f7285000)
        libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f26f7277000)
        libdbus-1.so.3 => /lib/x86_64-linux-gnu/libdbus-1.so.3 (0x00007f26f7221000)
        libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f26f6df2000)
        libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f26f6dc1000)
        libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f26f6c0b000)
        libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f26f6bf6000)
        libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x00007f26f6ba8000)
        libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x00007f26f6b5f000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f26f6ade000)
        libtirpc.so.3 => /lib/x86_64-linux-gnu/libtirpc.so.3 (0x00007f26f6ab0000)
        libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f26f6aa9000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f26f6a98000)
        libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007f26f69c8000)
        libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f26f69bc000)
        libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007f26f69ae000)
        libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f26f6867000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f26f6838000)
        libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f26f677c000)
        liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007f26f6756000)
        libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f26f672e000)

So all links present and correct, no traversal in libatalk path.

ghost commented 3 months ago

@rdmark and @NJRoadfan please feel free to commit to the dgsga-papd-test branch as we resolve this issue.

NJRoadfan commented 3 months ago

I found the problem! Somewhere along the way, Meson is defining the macro HAVE_UCS2INTERNAL = 1, while autotools is not. Clearly the systems in question don't support this. Commenting out the setting of that macro in config.h builds a working papd.

NJRoadfan commented 3 months ago

Note, the logic on the test in Meson isn't correct. Its merely checking if the test compiles successfully (which it always does if iconv is present), when it should be checking the returned result.

    if cc.compiles(
        '''
    #include <iconv.h>
    int main(void) {
        iconv_t cd = iconv_open("ASCII", "UCS-2-INTERNAL");
        if (cd == 0 || cd == (iconv_t)-1) return -1;
        return 0;
    }
    ''',
    )
        cdata.set('HAVE_UCS2INTERNAL', 1)
    endif
ghost commented 3 months ago

I found the problem! Somewhere along the way, Meson is defining the macro HAVE_UCS2INTERNAL = 1, while autotools is not. Clearly the systems in question don't support this. Commenting out the setting of that macro in config.h builds a working papd.

Nice one! Will fix today...

Update: Done. PR #1073