OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.9k stars 2.55k forks source link

CPLWorkerThreadPool: regression in master #10825

Closed rouault closed 1 month ago

rouault commented 1 month ago

@abellgithub I believe this is related to your recent changes

pytest autotest/utilities/test_gdal_viewshed.py --capture=no -ra -vv --capture=no -ra -vv on a debug build randomly stall or crashes for me, with traces like:

gdal_viewshed_path = '/home/even/gdal/gdal/build_cmake/apps/gdal_viewshed', tmp_path = PosixPath('/tmp/pytest-of-even/pytest-648/test_gdal_viewshed0')
viewshed_input = '/tmp/pytest-of-even/pytest-648/test_gdal_viewshed0/test_gdal_viewshed_in.tif'

    def test_gdal_viewshed(gdal_viewshed_path, tmp_path, viewshed_input):

        viewshed_out = str(tmp_path / "test_gdal_viewshed_out.tif")

        _, err = gdaltest.runexternal_out_and_err(
            gdal_viewshed_path
            + " -oz {} -ox {} -oy {} {} {}".format(
                oz[0], ox[0], oy[0], viewshed_input, viewshed_out
            )
        )
>       assert err is None or err == ""
E       assert ("ERROR 7: Assertion `psWorkerThread->bMarkedAsWaiting' failed in file `/home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp', line 218\n\nERROR ret code = -6" is None or "ERROR 7: Assertion `psWorkerThread->bMarkedAsWaiting' failed in file `/home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp', line 218\n\nERROR ret code = -6" == ''
E         + ERROR 7: Assertion `psWorkerThread->bMarkedAsWaiting' failed in file `/home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp', line 218
E         + 
E         + ERROR ret code = -6)
rouault commented 1 month ago

I also get a segmentation fault without assertion when running under gdb:

``` $ gdb --args /home/even/gdal/gdal/build_cmake/apps/gdal_viewshed -oz 100 -ox 621528 -oy 4817617 /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_in.tif /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_out.tif GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.2) 9.2 [...] (gdb) r Starting program: /home/even/gdal/gdal/build_cmake/apps/gdal_viewshed -oz 100 -ox 621528 -oy 4817617 /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_in.tif /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_out.tif [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7fffe1fad700 (LWP 235296)] [New Thread 0x7fffe17ac700 (LWP 235297)] [New Thread 0x7fffe0fab700 (LWP 235299)] [New Thread 0x7fffdbfff700 (LWP 235301)] 0...10...20...30...40...50[Thread 0x7fffe1fad700 (LWP 235296) exited] Thread 3 "gdal_viewshed" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe17ac700 (LWP 235297)] 0x0000555555b44598 in ?? () (gdb) bt #0 0x0000555555b44598 in ?? () #1 0x00007ffff6422b9a in std::function::operator()() const (this=0x7fffe17a9e00) at /usr/include/c++/9/bits/std_function.h:688 #2 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555707c20) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122 #3 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555b51670) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014 #4 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477 #5 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) thread apply all bt Thread 5 (Thread 0x7fffdbfff700 (LWP 235301)): #0 futex_wait_cancelable (private=, expected=0, futex_word=0x7fffc80019ec) at ../sysdeps/nptl/futex-internal.h:183 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fffc8001998, cond=0x7fffc80019c0) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=0x7fffc80019c0, mutex=0x7fffc8001998) at pthread_cond_wait.c:647 #3 0x00007ffff599de30 in std::condition_variable::wait(std::unique_lock&) () from /lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007ffff6421828 in std::condition_variable::wait >(std::unique_lock &, CPLJobQueue::) (this=0x7fffc80019c0, __lock=..., __p=...) at /usr/include/c++/9/condition_variable:101 #5 0x00007ffff64213c8 in CPLJobQueue::WaitCompletion (this=0x7fffc8001990, nMaxRemainingJobs=0) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:636 #6 0x00007ffff669f721 in gdal::viewshed::ViewshedExecutor::processLine (this=0x7fffffffc8c0, nLine=72, vLastLineVal=std::vector of length 103, capacity 103 = {...}) at /home/even/gdal/gdal/alg/viewshed/viewshed_executor.cpp:629 #7 0x00007ffff669fa38 in gdal::viewshed::ViewshedExecutor::::operator()(void) const (__closure=0x555555a726c0) at /home/even/gdal/gdal/alg/viewshed/viewshed_executor.cpp:680 #8 0x00007ffff66a088b in std::_Function_handler >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300 #9 0x00007ffff6422b9a in std::function::operator()() const (this=0x55555576ba48) at /usr/include/c++/9/bits/std_function.h:688 #10 0x00007ffff64211d5 in CPLJobQueue::::operator()(void) const (__closure=0x55555576ba40) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:618 #11 0x00007ffff6421d0c in std::_Function_handler):: >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300 #12 0x00007ffff6422b9a in std::function::operator()() const (this=0x7fffdbffce00) at /usr/include/c++/9/bits/std_function.h:688 #13 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555b3d330) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122 #14 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555703800) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014 #15 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477 #16 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 4 (Thread 0x7fffe0fab700 (LWP 235299)): #0 0x0000000000000000 in ?? () #1 0x00007ffff6422b9a in std::function::operator()() const (this=0x7fffe0fa8e00) at /usr/include/c++/9/bits/std_function.h:688 #2 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555754ae0) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122 #3 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555a72790) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014 #4 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477 #5 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 3 (Thread 0x7fffe17ac700 (LWP 235297)): #0 0x0000555555b44598 in ?? () #1 0x00007ffff6422b9a in std::function::operator()() const (this=0x7fffe17a9e00) at /usr/include/c++/9/bits/std_function.h:688 #2 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555707c20) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122 #3 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555b51670) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014 #4 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477 #5 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7fffe1fb37c0 (LWP 234501)): #0 futex_wait_cancelable (private=, expected=0, futex_word=0x555555b8e238) at ../sysdeps/nptl/futex-internal.h:183 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555555b8e1e8, cond=0x555555b8e210) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=0x555555b8e210, mutex=0x555555b8e1e8) at pthread_cond_wait.c:647 #3 0x00007ffff599de30 in std::condition_variable::wait(std::unique_lock&) () from /lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007ffff6421828 in std::condition_variable::wait >(std::unique_lock &, CPLJobQueue::) (this=0x555555b8e210, __lock=..., __p=...) at /usr/include/c++/9/condition_variable:101 #5 0x00007ffff64213c8 in CPLJobQueue::WaitCompletion (this=0x555555b8e1e0, nMaxRemainingJobs=0) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:636 #6 0x00007ffff642103f in CPLJobQueue::~CPLJobQueue (this=0x555555b8e1e0, __in_chrg=) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:572 #7 0x00007ffff6666eae in std::default_delete::operator() (this=0x7fffffffc6e8, __ptr=0x555555b8e1e0) at /usr/include/c++/9/bits/unique_ptr.h:81 #8 0x00007ffff6665dea in std::unique_ptr >::~unique_ptr (this=0x7fffffffc6e8, __in_chrg=) at /usr/include/c++/9/bits/unique_ptr.h:292 #9 0x00007ffff669fd32 in gdal::viewshed::ViewshedExecutor::run (this=0x7fffffffc8c0) at /home/even/gdal/gdal/alg/viewshed/viewshed_executor.cpp:660 --Type for more, q to quit, c to continue without paging-- #10 0x00007ffff669c902 in gdal::viewshed::Viewshed::run (this=0x7fffffffcf40, band=0x5555556a4fc0, pfnProgress=0x7ffff641c1b4 , pProgressArg=0x0) at /home/even/gdal/gdal/alg/viewshed/viewshed.cpp:353 #11 0x000055555558d8b8 in main (argc=9, argv=0x555555707020) at /home/even/gdal/gdal/apps/gdal_viewshed.cpp:362 ```
rouault commented 1 month ago

On a debug build and with -DCMAKE_CXX_FLAGS_DEBUG=-DDEBUG so that CPLAssert() is turned on

And this seems to be specific with gcc 9.4 of Ubuntu 20.04. Can't reproduce with gcc 13.2 of Ubuntu 24.04