OpenPrinting / cups-browsed

Apache License 2.0
33 stars 10 forks source link

CUPS get stuck after restart on active cups-browsed print jobs #27

Open Matze1224 opened 6 months ago

Matze1224 commented 6 months ago

Describe the bug Printer queues getting stuck or disabled and sometimes yield a status message "No suitable destination host found by cups-browsed, retrying later" or "No destination host name supplied by cups-browsed for printer , is cups-browsed running?". Printing on other printer queues from cups-browsed works successful, at least most of the time.

We tried solving the problem by clearing all print queues on the affected workstations at first (stopping both daemons and clear printers.conf), but it didn't stop it. The patch https://github.com/OpenPrinting/cups-browsed/commit/57d9351ea45f47b3bd185f263b1e37d276cf17b8 from #23 didn't really solved it, too.

At the moment, I suspect a line of shell script in our configuration management (fai) which restarts the cups-browsed daemon after a configuration change.

To Reproduce Steps to reproduce the behavior:

  1. Print a job to a printer queue managed by cups-browsed. Helpful if the print job is bigger than a few pages so you got more time to react.
  2. Wait till the program printed and the job is processed by CUPS.
  3. Restart the cups or cups-browsed systemd unit. Both should result in some or an other error described above.
  4. Printing again to the same queue yields to the same error. Maybe a single print job passes through, but the same error is excepted.

Expected behavior Printing works even through restart from one of the responsible daemons or the avoidance of persisting the error.

System Information:

cups-browsed and cups-filters are backported from Ubuntu mantic (Version 2.0.0-0ubuntu2) because of trouble in earlier versions in junction to our CUPS server on Debian 11. Also added the patch https://github.com/OpenPrinting/cups-browsed/commit/57d9351ea45f47b3bd185f263b1e37d276cf17b8

Additional context From my current understanding, it's a problem when CUPS want to print but cups-browsed hasn't detected the remote printer. It could be because of those cases:

  1. cups-browsed is in restart and therefor, printing isn't possible.
  2. CUPS started and have unfinished jobs from the previous run. The daemon tries to print but cups-browsed didn't detected the remote printers yet, so no target to print.

When restarting CUPS, these problematic printing queues persist while other queues appear after cups-browsed detected them. Don't know if thats just a CUPS problem because theres a print job for him also.

On most workstations that reported the problem, we found log messages that systemd killed the service at some time and cups-browsed reports the following message for all print queues at the next start:

Timeout happened during creation of the queue <name>, turn on DebugLogging for more info.

We now tried to temporarily solve this problem by the following systemd unit override for cups.service:

# /etc/systemd/system/cups.service.d/20-cups-jobs.conf
[Service]
ExecStartPre=/usr/bin/find /var/spool/cups -maxdepth 1 -type f -delete

It cleans the printing queue before the start so it wont trigger an undetected cups-browsed printer. I can give feedback if this solves it. At least at the next restart.

Would be nice if cups-browsed would be more resilient with this.

Matze1224 commented 5 months ago

Additional to the reproducable error, we still had problems with this issue. Looks like 87 network printers and a VPN connection isnt the best usecase. In the debug log, I discovered longer waiting for the update_netifs function, especially if the client is connected via VPN. Maybe our latest VPN performance problem made it exponential.

On inhouse workstations, rechecking all printer queues took around 3s, for VPN clients it took around 1-2m. Now, it constantly take 0s with the following configuration option (which makes it a lot more stable):

FrequentNetifUpdate No

The trouble which a long printer queue refresh makes is quite interesting, as the daemon cant respond to implicitclass which expects the daemon to respond the URL for the printer. Because the daemon doesnt interrupt its queue refresh for this, implicitclass times out and cups reacts on this error with its policy (disable printer etc.).

I better not ask why this function needs to be called for each printer queue individually rather than wish I would found this configuration option more quickly ;-) I will look if the one configuration option ruled them all or the original errors are still reproducible with those better timings. Mixed feelings about this because inhouse workstations where affected with the original problems.