Open courtarro opened 4 years ago
This is really weird. I expanded line 4051, which was triggering the segfault:
if ((uint16_t)block_id == _block_count - 1) {
to the following 4 lines:
uint16_t bc = _block_count;
uint16_t last_block = bc - 1;
uint16_t bid_u16 = (uint16_t)block_id;
if (bid_u16 == last_block) {
Now the segfault happens at the very first line, when attempting to read the value of _block_count
. I don't understand why it would be unable to read that variable.
0x00007ffff5d69375 in wirehair::Codec::Encode (this=0x55b6ba50, block_id=1, block_out=0x555555bb2ef0, out_buffer_bytes=32)
at /home/(redacted)/software/external/wirehair/WirehairCodec.cpp:4051
4051 uint16_t bc = _block_count;
GDB is also unable to read it. Here is the attempt to read block_id
, which works, and _block_count
, which doesn't:
(gdb) print block_id
$1 = 1
(gdb) print _block_count
Cannot access memory at address 0x55b6ba54
Hi! Courtarro! From your gdb try use first (lines 97 and 98) blockid = ctypes.c_uint16(0) needed = ctypes.c_uint16(0)
worked ? if yes, pleeaase try change line 116 to: ctypes.c_uint16(blockid.value), #ID of block to generate
Thanks For the patience! :-)
[]'s Dani.
I finally got around to trying this. I replaced the above listed mentions of c_uint()
with c_uint16()
as well as another place where c_int()
was used (substituted c_int32()
in that case). Still segfaults.
If it's segfaulting probably the best way to debug is to build in debug mode and attach a debugger to it. Probably some input is invalid to the C++ code.
I'm not an expert at ctypes. Python thinks the encoder
variable is the default c_int
, rather than a full WirehairCodec
object. Any reason that might confuse the garbage collection process? The variable stays in scope, so I don't think that would be it. But gdb
is unable to access any member variable of the WirehairCodec
object, which leads me to believe there's some sort of memory corruption going on.
With Python 2.7 going away, I'm not that worried about whether it works with Python 2.7 in the long term. My original motivation was to use this with GnuRadio 3.7, which is P2.7-based, and GR has since moved to Python 3. However, I'd like to better understand the problem in case it's actually just revealing a more serious underlying issue and P3 happens not to trigger it, but could end up failing later.
I read some ctypes docs. I think what might be missing is this:
wirehair.wirehair_encoder_create.restype = ctypes.c_void_p
Maybe also need to wrap it like this: c_void_p(wirehair.wirehair_encoder_create(...))
What may be happening is the default type is a 32-bit integer, which truncated the 64-bit pointer from the library. Passing it back in would lead to invalid memory access as you described...
Running on 12-thread i7 in 64-bit Linux (Ubuntu Bionic). Compiled and installed libwirehair-shared.so and ran
python2 whirehair.py
:GDB stack trace:
Works fine in Python 3. I am currently debugging.