ImageMagick / ImageMagick

🧙‍♂️ ImageMagick 7
https://imagemagick.org
Other
12.37k stars 1.37k forks source link

CPU pegged at 100% and sched_yield() called in infinite loop when OpenMP enabled #7190

Closed caliguian closed 8 months ago

caliguian commented 8 months ago

ImageMagick version

7.1.1-26

Operating system

Other (enter below)

Operating system, version and so on

FreeBSD 13.2-RELEASE-p9

Description

We have been using OpenMP with Imagick for as long as I can remember (many years), but after our system was recently updated to PHP 8.1+ and ImageMagick 7.1.1-26, we started experiencing random instances of processes getting locked at 100% CPU usage with truss revealing that sched_yield(); is being called in an infinite loop on the locked processes. This appears to happen, on seemingly random occasions, when the Imagick and OpenMP modules initially load. Backtrace details for an example process that was stuck in a sched_yield() infinite loop can be seen below:

(gdb) thread apply all bt

Thread 1 (LWP 100499 of process 33248):
#0  sched_yield () at sched_yield.S:4
#1  0x00000008495aa491 in __kmp_wait_4_ptr (spinner=0x84960b51c <__kmp_initz_lock+28>, checker=1, pred=0x8495c4b50 <__kmp_bakery_check(void*, unsigned int)>, obj=0x84960b500 <__kmp_initz_lock>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_dispatch.cpp:2695
#2  0x00000008495c37c8 in __kmp_acquire_ticket_lock_timed_template (lck=0x0, gtid=<optimized out>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.cpp:640
#3  __kmp_acquire_ticket_lock (lck=0x0, gtid=<optimized out>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.cpp:645
#4  0x00000008495c9564 in __kmp_acquire_bootstrap_lock (lck=0x84960b51c <__kmp_initz_lock+28>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.h:524
#5  __kmp_get_global_thread_id_reg () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:253
#6  0x000000084958991e in __kmp_entry_thread () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp.h:3401
#7  kmpc_malloc (size=35590812956) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_alloc.cpp:1098
#8  0x00000008496041b7 in __kmp_is_address_mapped (addr=0x847bad5c8) at /usr/src/contrib/llvm-project/openmp/runtime/src/z_Linux_util.cpp:2067
#9  0x00000008495d3925 in __kmp_register_library_startup () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6774
#10 0x00000008495c95c4 in __kmp_do_serial_initialize () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6966
#11 0x00000008495c9578 in __kmp_get_global_thread_id_reg () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:255
#12 0x00000008495b40b2 in __kmp_api_omp_init_lock (user_lock=0x84960b51c <__kmp_initz_lock+28>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_ftn_entry.h:1083
#13 0x000000084b24773c in MagickCoreGenesis () from /usr/local/lib/libMagickCore-7.so.10
#14 0x000000084a26bd8e in zm_startup_imagick () from /usr/local/lib/php/20230831-debug/imagick.so
#15 0x000000000083b010 in zend_startup_module_ex ()
#16 0x000000000083bb10 in ?? ()
#17 0x0000000000853131 in zend_hash_apply ()
#18 0x000000000083b90c in zend_startup_modules ()
#19 0x0000000000766b0f in php_module_startup ()
#20 0x00000000009f51e9 in ?? ()
#21 0x00000000009f37a4 in ?? ()
#22 0x0000000000423720 in _start ()
(gdb)

We have tried updating from PHP 8.1 to 8.3, and we have tried rebuilding Image Magick with a few different options disabled, but the problem persists. Unfortunately, we cannot simply disable OpenMP support, since doing so results in a significant slowdown of the convert processes.

Steps to Reproduce

Repeatedly run a convert operation (any operation will do) on an image in a loop, using PHP, until the process gets stuck. Then, view what's happening through the use of truss, which will reveal that it is stuck in a sched_yield() infinite loop.

Images

No response

dlemstra commented 8 months ago

We don't maintain the imagick extension so you will need to ask your question somewhere else.

caliguian commented 8 months ago

Thanks for the response! This same issue also happens on standard convert commands, outside of the Imagick extension. But I would be happy to ask the question in a more appropriate location; do you happen to know where that might be?

dlemstra commented 8 months ago

The stacktrace that you included contains php and imagick methods so I assumed this was only an issue when using it in php. How can we reproduce this with magick on the command line? And what is the stacktrace then? And did you also check if your openmp libraries have been updated? This also had bugs in the past.

caliguian commented 8 months ago

Thanks again for the response, dlemstra. I was able to recreate the issue without Imagick in roughly the same manner. I created two processes, with each process repeatedly adding a watermark to another image, and then I just waited for something to lock up.

I am adding watermarks to the top-left corner of another image using the following: convert 'originalfile.jpg' 'watermarkfile.png' -set colorspace sRGB -gravity NorthWest -geometry +3+3 -define compose:args=20,100 -compose dissolve -composite 'resultingfile.jpg'

Here are the traces for the two processes when the lock happened:

(gdb) thread apply all bt

Thread 1 (LWP 101740 of process 79048):
#0  sched_yield () at sched_yield.S:4
#1  0x0000000824f60491 in __kmp_wait_4_ptr (spinner=0x824fc151c <__kmp_initz_lock+28>, checker=1, pred=0x824f7ab50 <__kmp_bakery_check(void*, unsigned int)>, obj=0x824fc1500 <__kmp_initz_lock>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_dispatch.cpp:2695
#2  0x0000000824f797c8 in __kmp_acquire_ticket_lock_timed_template (lck=0x0, gtid=<optimized out>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.cpp:640
#3  __kmp_acquire_ticket_lock (lck=0x0, gtid=<optimized out>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.cpp:645
#4  0x0000000824f7f564 in __kmp_acquire_bootstrap_lock (lck=0x824fc151c <__kmp_initz_lock+28>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.h:524
#5  __kmp_get_global_thread_id_reg () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:253
#6  0x0000000824f3f91e in __kmp_entry_thread () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp.h:3401
#7  kmpc_malloc (size=34980238620) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_alloc.cpp:1098
#8  0x0000000824fba1b7 in __kmp_is_address_mapped (addr=0x8242955c8) at /usr/src/contrib/llvm-project/openmp/runtime/src/z_Linux_util.cpp:2067
#9  0x0000000824f89925 in __kmp_register_library_startup () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6774
#10 0x0000000824f7f5c4 in __kmp_do_serial_initialize () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6966
#11 0x0000000824f7f578 in __kmp_get_global_thread_id_reg () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:255
#12 0x0000000824f6a0b2 in __kmp_api_omp_init_lock (user_lock=0x824fc151c <__kmp_initz_lock+28>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_ftn_entry.h:1083
#13 0x0000000821df77ac in MagickCoreGenesis () from /usr/local/lib/libMagickCore-7.so.10
#14 0x0000000000202512 in ?? ()
#15 0x00000000002022f0 in ?? ()
#16 0x00000000002021f0 in ?? ()
#17 0x0000000000000000 in ?? ()

And then the second one that locked up at the same time...

(gdb) thread apply all bt

Thread 1 (LWP 153345 of process 79049):
#0  sched_yield () at sched_yield.S:4
#1  0x0000000825da5491 in __kmp_wait_4_ptr (spinner=0x825e0651c <__kmp_initz_lock+28>, checker=1, pred=0x825dbfb50 <__kmp_bakery_check(void*, unsigned int)>, obj=0x825e06500 <__kmp_initz_lock>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_dispatch.cpp:2695
#2  0x0000000825dbe7c8 in __kmp_acquire_ticket_lock_timed_template (lck=0x0, gtid=<optimized out>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.cpp:640
#3  __kmp_acquire_ticket_lock (lck=0x0, gtid=<optimized out>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.cpp:645
#4  0x0000000825dc4564 in __kmp_acquire_bootstrap_lock (lck=0x825e0651c <__kmp_initz_lock+28>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_lock.h:524
#5  __kmp_get_global_thread_id_reg () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:253
#6  0x0000000825d8491e in __kmp_entry_thread () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp.h:3401
#7  kmpc_malloc (size=34995201308) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_alloc.cpp:1098
#8  0x0000000825dff1b7 in __kmp_is_address_mapped (addr=0x8246965c8) at /usr/src/contrib/llvm-project/openmp/runtime/src/z_Linux_util.cpp:2067
#9  0x0000000825dce925 in __kmp_register_library_startup () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6774
#10 0x0000000825dc45c4 in __kmp_do_serial_initialize () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6966
#11 0x0000000825dc4578 in __kmp_get_global_thread_id_reg () at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_runtime.cpp:255
#12 0x0000000825daf0b2 in __kmp_api_omp_init_lock (user_lock=0x825e0651c <__kmp_initz_lock+28>) at /usr/src/contrib/llvm-project/openmp/runtime/src/kmp_ftn_entry.h:1083
#13 0x00000008226c37ac in MagickCoreGenesis () from /usr/local/lib/libMagickCore-7.so.10
#14 0x0000000000202512 in ?? ()
#15 0x00000000002022f0 in ?? ()
#16 0x00000000002021f0 in ?? ()
#17 0x0000000000000000 in ?? ()
dlemstra commented 8 months ago

What do you mean by I created two processes? We still don't have enough information to reproduce this. And this really feels like an internal OpenMP issue. I suspect this is happening because you are calling our MagickCoreGenesis method from two threads at the same time and they are both sharing the same look. You can probably reproduce this with just:

static omp_lock_t your_lock;
omp_init_lock(&your_lock);
caliguian commented 8 months ago

Thank you for your response. Allow me to provide further clarification on the issue I'm experiencing, as it seems there might be a misunderstanding.

The issue I'm encountering involves a deadlock scenario during the execution of multiple concurrent ImageMagick processes. While the deadlock does involve the OpenMP library, the root cause appears to be related to the way ImageMagick interacts with OpenMP within a multi-threaded environment.

To illustrate the issue, I simplified the scenario by creating two separate processes, each performing a similar task of adding a watermark to an image (different images in each process) using the convert command. During the execution of these processes, they both become deadlocked, as evidenced by the repetitive calls to the sched_yield() function.

Here's a breakdown of the observed behavior:

It's important to note that this issue wasn't specific to ImageMagick's Imagick extension, as I was able to reproduce it by directly invoking the convert command.

I understand your suspicion regarding the shared lock contention. However, the deadlock occurs even when each process operates on separate images, indicating that the contention is not solely due to shared resources within ImageMagick.

Given these observations, I believe there might be an underlying issue with how ImageMagick interacts with OpenMP in a multi-threaded environment, potentially leading to race conditions or synchronization problems.

dlemstra commented 8 months ago

I should probably also add some more clarification the method MagickCoreGenesis is a method that should only be called once when your application start and it seems to be executed multiple times. The lock is static and this means it is shared between threads.

This all points to something inside the OpenMP library. We are using omp_init_lock as it should be used, there is no real choice on how we should use it. As I said before I suspect that you can probably reproduce this in an application that only does a call to omp_init_lock. And have you also checked if your OpenMP version has changed?

We cannot really help you further, it is up to you now to send us a patch if there is an issue inside ImageMagick. But I really think you are looking in the wrong direction. If this would be a general issue inside ImageMagick we would have heard about this from more of our users.

caliguian commented 7 months ago

Thank you for your help and suggestions on this. A fix for the issue was recently added to the OpenMP/LLVM project, here: https://github.com/llvm/llvm-project/pull/88539

dlemstra commented 7 months ago

Thanks for coming back and leaving a message for other users.