Open catkira opened 4 months ago
Wondering - can you check if iiod is still running when you get 111? Maybe this is more a plutosdr-fw issue...
Hmm yes, good idea, maybe it's just an iiod crash. I will check when it happens next time.
@mhennerich I checked it, iiod was not running anymore. Does iiod store logs somewhere, or can I start it with command line parameters to enable logging?
when I restart iiod manually with -d and provoke the crash again I can see this output:
New client connected from 192.168.137.1
New client connected from 192.168.137.1
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.
Client exited
Client exited
Client exited
New client connected from 192.168.137.1
New client connected from 192.168.137.1
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
Client exited
DEBUG: Block 0 freed.
Segmentation fault
#
can you run it (and make it crash) in gdb, so you can get a backtrace?
I think gdb is not included in the pluto buildroot, so I would have to cross-compile it first right?
There was already a gdb package available in buildroot. I added it to my board config and ran it again. I could not provoke the crash anymore. The weird thing is that the crash also does not happen anymore if I run iiod without gdb. It might have been a faulty buildroot build-state, buildroot is quite sensitive if one does not do clean rebuilds all the time. I close this issue for now and reopen it if the crash pops up again.
The crash happened again. I managed to reproduce it while running iiod in gdb. This is the output:
(you can ignore the "-- axi_dmac_terminate --", thats a debug printf from kernel)
@rgetz Stack back trace does not show much, because the stack is corrupted
@pcercuei do you have any idea?
I am using 7ae483696affb64ab6e1766f55e47f00456593d0 (and I don't see any relevant commits after this that could probably fix this crash)
I think the crash is caused by a call to iio_buffer_cancel()
I used to call the following functions in this order
iio_buffer_cancel(buffer);
iio_channels_mask_destroy(mask);
iio_buffer_destroy(buffer);
the crash does not happen anymore when I change the order to this:
iio_buffer_cancel(buffer);
iio_buffer_destroy(buffer);
iio_channels_mask_destroy(mask);
@pcercuei is this behaviour correct?
The crash also happens if I just call iio_buffer_cancel(buffer);
twice after each other. I know it does not make sense to do it, but still it should not crash.
No it does not help, it still sometimes crashes when I call iio_buffer_cancel()
@pcercuei I mean this issue <3
I rebuild libiio with LOG_LEVEL=Debug. This is the output that I get before a crash:
"-- axi_dmac_terminate_all --" is a debug printf from the kernel driver, that appears every time when I call iio_buffer_cancel()
. It is the expected behaviour that this message appears.
From the debug output it looks like I do two calls to iio_buffer_cancel()
within a short period. When the 2nd calls happens iiod is still freeing stuff and it seems that's when it crashes.
An easy scenario to reproduce this (or a similar issue) is to just do two calls to iio_buffer_cancel()
right after each other. iiod will crash on the 2nd call.
I think the problem is that there are some iio_block_dequeue commands queued that get executed even after blocks are already freed.
The iio_block_dequeue() calls are from the buffer-dequeue-thd
, but the iio_block_destroy() calls come from the iiod-responder-reader-thd
thread.
Is it possible that the iio_block_destroy()
should be blocked until the buffer-dequeue-thd
is finished?
@mhennerich @pcercuei do you have a little hint for me? :)
I think I have some new information. The crash seems to happen that a call to iio_block_destroy()
after iio_buffer_cancel()
sometimes causes iiod to crash, because it looks like iio_buffer_cancel()
already causes the blocks to be automatically destroyed. It's very weird. The libiio code is a bit unintuitive for me, not something I can fix quickly. I think in the long run, this issue should be fixed. For now I will live with my workaround.
Hi @catkira, We are looking into this. Thank you for providing these details that helps us debug this issue.
Sometimes when I do weird things, ie my remote libiio app crashes without closing the network context, I cannot create another context to my plutosdr connected via network.
The error message I get when I do
iio_info -u ip:192.168.137.2
isERROR: Unable to create IIO context ip:192.168.137.2: Connection refused (111)
I have to reboot plutosdr to recover from this.