Open rouault opened 2 years ago
Possibly related, we're seeing a similar crash when opening different files: https://github.com/georust/gdal/issues/299. One difference is that there it seems to overflow the stack while trying to print a stack trace (but that's just a guess, I'm not familiar with the two libraries).
Do you know if you are using pthreads for your mutex?
Do you know if you are using pthreads for your mutex?
using std::thread requires linking the code with -lpthread. Apparently the std::mutex implementation on Linux / gcc also uses pthread_mutex_lock() / pthread_mutex_unlock() underneeth.
cf
$ cat test2.cpp
#include <mutex>
std::mutex x;
int y = 0;
int foo()
{
std::lock_guard<std::mutex> locker(x);
return ++y;
}
$ g++ -O2 test2.cpp -c
$ LC_ALL=C objdump -sdwxC test2.o
0000000000000000 <foo()>:
0: f3 0f 1e fa endbr64
4: 41 54 push %r12
6: 53 push %rbx
7: 48 83 ec 08 sub $0x8,%rsp
b: 48 8b 1d 00 00 00 00 mov 0x0(%rip),%rbx # 12 <foo()+0x12> e: R_X86_64_REX_GOTPCRELX __pthread_key_create-0x4
12: 48 85 db test %rbx,%rbx
15: 74 10 je 27 <foo()+0x27>
17: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 1e <foo()+0x1e> 1a: R_X86_64_PC32 x-0x4
1e: e8 00 00 00 00 callq 23 <foo()+0x23> 1f: R_X86_64_PLT32 pthread_mutex_lock-0x4
23: 85 c0 test %eax,%eax
25: 75 2d jne 54 <foo()+0x54>
27: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 2d <foo()+0x2d> 29: R_X86_64_PC32 y-0x4
2d: 44 8d 60 01 lea 0x1(%rax),%r12d
31: 44 89 25 00 00 00 00 mov %r12d,0x0(%rip) # 38 <foo()+0x38> 34: R_X86_64_PC32 y-0x4
38: 48 85 db test %rbx,%rbx
3b: 74 0c je 49 <foo()+0x49>
3d: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 44 <foo()+0x44> 40: R_X86_64_PC32 x-0x4
44: e8 00 00 00 00 callq 49 <foo()+0x49> 45: R_X86_64_PLT32 pthread_mutex_unlock-0x4
49: 48 83 c4 08 add $0x8,%rsp
4d: 44 89 e0 mov %r12d,%eax
50: 5b pop %rbx
51: 41 5c pop %r12
53: c3 retq
54: 89 c7 mov %eax,%edi
56: e8 00 00 00 00 callq 5b <x+0x3b> 57: R_X86_64_PLT32 std::__throw_system_error(int)-0x4
To be noted that the original reproducer with GDAL uses a global lock that is a pthread_mutex_t
I seem to have found a workaround on GDAL side in https://github.com/OSGeo/gdal/pull/6311 by re-using the same netcdf handle when sequences like nc_open("sam_file.nc", ....); nc_open("sam_file.nc", ....);
are done
This is an attempt at providing a minimum reproducer for https://github.com/OSGeo/gdal/issues/6253
The attached docker.zip contains a simple Dockerfile building libhdf5 and libnetcdf, and a simple C++ program. The C++ program loops at creating 2 threads, which open the same NC4 file and one calls nc_inq_varname(). It is to be noted that all calls to the netCDF API are protected by a common mutex , so there's no concurrent access to the netCDF API.
I've tried different versions of hdf5 and netcdf, and compiling hdf5 with or without --enable-unsupported --enable-threadsafe, but the crash always occur
How to reproduce:
results in
and under valgrind:
Dockerfile:
test.cpp: