OpenPrinting / cups

OpenPrinting CUPS Sources
https://openprinting.github.io/cups
Apache License 2.0
967 stars 177 forks source link

cusp_array_add debug loop causes delays #875

Closed vliaskov closed 5 months ago

vliaskov commented 5 months ago

Describe the bug

Customer faces job processing and monitoring delays that can go up to at least 10 seconds. Stacktraces during the delay points to:

Core was generated by `/usr/sbin/cupsd'.
#0  0x00007f0c799be784 in _cups_debug_printf (format=format@entry=0x7f0c79a074b0 "9cups_array_add: a->elements[%lld]=%p") at debug.c:164
164 }
(gdb) bt
#0  0x00007f0c799be784 in _cups_debug_printf (format=format@entry=0x7f0c79a074b0 "9cups_array_add: a->elements[%lld]=%p") at debug.c:164
#1  0x00007f0c799ba9e0 in cups_array_add (a=0x564dab5575c0, e=e@entry=0x564dc40a82b0, insert=insert@entry=0) at array.c:1182
#2  0x00007f0c799bacad in cupsArrayAdd (a=<optimized out>, e=e@entry=0x564dc40a82b0) at array.c:103
#3  0x00007f0c799ff101 in _cupsStrAlloc (s=s@entry=0x7ffdcd380220 "ipp://sapspoolprd.brose.net:631/printers/nbop0139") at string.c:134
#4  0x00007f0c799d8c1c in ippAddString (ipp=0x564dbadb6b10, group=group@entry=IPP_TAG_PRINTER, value_tag=<optimized out>, value_tag@entry=IPP_TAG_URI, name=name@entry=0x564da982896b "printer-uri-supported", language=language@entry=0x0, 
    value=value@entry=0x7ffdcd380220 "ipp://sapspoolprd.brose.net:631/printers/nbop0139") at ipp.c:965
#5  0x0000564da97e798c in copy_printer_attrs (con=con@entry=0x564dc0e8e940, printer=printer@entry=0x564dc085bfb0, ra=ra@entry=0x564dc4f0af10) at ipp.c:5080
#6  0x0000564da97ec5b1 in get_printers (con=0x564dc0e8e940, type=0) at ipp.c:7598
#7  0x0000564da980b303 in cupsdProcessIPPRequest (con=con@entry=0x564dc0e8e940) at ipp.c:542
#8  0x0000564da97fa565 in cupsdReadClient (con=0x564dc0e8e940) at client.c:2053
#9  0x0000564da9822b4f in cupsdDoSelect (timeout=timeout@entry=1) at select.c:484
#10 0x0000564da97f1b5e in main (argc=<optimized out>, argv=<optimized out>) at main.c:831

(gdb) up
#1  0x00007f0c799ba9e0 in cups_array_add (a=0x564dab5575c0, e=e@entry=0x564dc40a82b0, insert=insert@entry=0) at array.c:1182
1182        DEBUG_printf(("9cups_array_add: a->elements[" CUPS_LLFMT "]=%p", CUPS_LLCAST current, a->elements[current]));
(gdb) print *a
$1 = {num_elements = 977949, alloc_elements = 979968, current = -1, insert = 967673, unique = 0, num_saved = 0, saved = {0 <repeats 32 times>}, elements = 0x7f0c73a81010, compare = 0x7f0c799ff040 <compare_sp_items>, data = 0x0, 
  hashfunc = 0x0, hashsize = 0, hash = 0x0, copyfunc = 0x0, freefunc = 0x0}

The _cups_debug_printf calls are coming from this loop in cups/array.c:cups_array_add

#ifdef DEBUG
  for (current = 0; current < a->num_elements; current ++)
    DEBUG_printf(("9cups_array_add: a->elements[" CUPS_LLFMT "]=%p", CUPS_LLCAST current, a->elements[current]));
#endif /* DEBUG */

Customer uses LogLevel info (CUPS_DEBUG_LEVEL=7) or less; this is less than LogLevel debug2 (CUPS_DEBUG_LEVEL=9) which is used in the loop's printfs, so no debug output is not printed.

Even so, the large number of a->num_elements (~1M) means cups_debug_printf checks the debug log level ~1M times. I suspect this may be a cause of delay.

I wonder if this change to simply not enter the debug loop with debug level < 9 would be accepted:

 #ifdef DEBUG
+  if (_cups_debug_level >= 9)
   for (current = 0; current < a->num_elements; current ++)
     DEBUG_printf(("9cups_array_add: a->elements[" CUPS_LLFMT "]=%p", CUPS_LLCAST current, a->elements[current]));
 #endif /* DEBUG */

While code outside cups/debug.c shouldn't normally check _cupsd_debug_level I see cups/http.c: http_debug_hex also directly checking the debug level.

To Reproduce

Only reproducible in customer's setup. This stresses cups as it uses more than 3000 print queues.

Expected behavior

Debug code should not cause performance problem / delay when the debug level is less than that specified for the debug code to be executed.

System Information:

Additional context Follow the instructions from the REPORTING-ISSUES document and provide all information regarding the issue.

michaelrsweet commented 5 months ago

This debugging is not intended for production environments, but we can look into making it more efficient in a future release of CUPS.

vliaskov commented 5 months ago

This debugging is not intended for production environments

I assume by this you mean to recommend to completely disable debug logging capabilities in production i.e. not use --enable-debug-printfs at all for building cups in production. Is that correct?

we can look into making it more efficient in a future release of CUPS.

Thanks. This particular problem though seems to be that, in high load cases, this debug loop affects performance even with lower debug log levels than the one used for the actual loop.

michaelrsweet commented 5 months ago

This debugging is not intended for production environments

I assume by this you mean to recommend to completely disable debug logging capabilities in production i.e. not use --enable-debug-printfs at all for building cups in production. Is that correct?

Yes, correct.

we can look into making it more efficient in a future release of CUPS.

Thanks. This particular problem though seems to be that, in high load cases, this debug loop affects performance even with lower debug log levels than the one used for the actual loop.

Again, we can look into making it more efficient, but debug logging necessarily isn't concerned with performance as much as it is concerned with allowing the CUPS developers to debug problems with the software during development.

michaelrsweet commented 5 months ago

I ended up just removing this debug code since we don't have it in libcups v3.

jsmeix commented 5 months ago

@michaelrsweet thank you for your prompt reply! It helped a lot.

A general addedum FYI:

There is a dilemma of having even advanced debugging functionality ready to use for users so when a special issue hits a user all debugging is directly possible at the user's system without the need that someone must provide a user with an extraordinary problem a specially built CUPS to get advanced debugging functionality enabled versus performance loss because of debugging functionality.

In particular as in this case when a problem is

Only reproducible in customer's setup. 

Then ready to use advanced debugging functionality could normally help to understand the user's problem but here in this particular case the ready to use advanced debugging functionality was unfortunately the cause (or at least one reason) for the problem.

I think the actual root cause in this particular case is that the cupsd runs basically overloaded so that it cannot process the load "as usually expected" and then various things fail in various unexpected ways here and there. My reason why I think that this is the actual root cause is that we (i.e. SUSE) don't know about other issues like this which proves what I wrote in https://github.com/OpenPrinting/cups/discussions/873

... things do work well in practice
up to hundreds of print queues
on a single server under normal conditions

even with advanced debugging functionality built-in.