OpenPrinting / cups

OpenPrinting CUPS Sources
https://openprinting.github.io/cups
Apache License 2.0
1.05k stars 187 forks source link

Fix CI for CUPS 2.5 #857

Closed zdohnal closed 9 months ago

zdohnal commented 9 months ago

I found time to look why CI fails on Linux and came up with two fixes - for deadlock and infinite loop.

zdohnal commented 9 months ago

@michaelrsweet linux tests probably passes on httpAddrGetList() because the Linux machine hostname is not .local.... so mDNS resolution might be broken there as well... but what is interesting, Alex's job which is not so old and Mac CI failed before on the issue, his job passes... looks like it depends on something inside CI...

zdohnal commented 9 months ago

@michaelrsweet If you mean the hangs in Linux pipelines, one was directly connected to code with Avahi support, so you would need to build CUPS with Avahi, ideally in some Linux VM. And the deadlock happens basically due race condition, when the detached thread just unlocked the mutex and waits in poll() and in this interval the main thread tries to destroy DNSSD struct and locks the mutex... The infinite loop happened because those two cups_elapsed(), which change the time baseline, were close to each other, thus the time difference was too small or none to get the loop finish in actual 1s as the default remaining time is. What I can think of is what could happen and could cause normal behavior is that your machine takes more time between those calls (so there is a bigger difference) or the time is more precise, so a time difference is generated as well. This one was a tough to realize for me, since I always first connect by gdb when debugging, so of course I was not as quick as machine, so I didn't realize this until I watched strace output and thought about it :D . But I could reproduce both issues locally on my Fedora 38 laptop.

I was thinking about the MacOS issue as well - we add .local. to machine's hostname in httpGetHostname() for MacOS if the hostname does not contain a dot. So as a workaround we can set the machine hostname to a new which will contain a dot, but this way we will lose the coverage of this part of code. Do you know what package/module/library does mDNS resolution in MacOS? I know Mac uses mDNSResponder, but I don't know if it contains plugin for resolving .local addresses in glibc functions like getaddrinfo() - nss-mdns does it in Linux by adding a new value into the line 'hosts' in name service switch configuration file (either /etc/nsswitch.conf or via authselect), which makes glibc to load a library provided by nss-mdns, which sends the resolving request to Avahi, processes response and gives answer to the glibc function. Does it work a similar way on Mac?

michaelrsweet commented 9 months ago

@zdohnal I've been mostly doing my local Linux testing on Ubuntu, either in a VM of some sort (multipass and Parallels) or on my dedicated Linux laptop. Like I said, I wasn't able to make any of them fail, but then I always have network printers that show up for cups_enum_dests...

WRT mDNS on macOS, it is indeed mDNSResponder and that software will always be able to resolve the local hostname + .local, even when all external network interfaces are down (I've tested that to try to isolate the cause of the CI hang...)