analogdevicesinc / libiio

A cross platform library for interfacing with local and remote Linux IIO devices
http://analogdevicesinc.github.io/libiio/
GNU Lesser General Public License v2.1
470 stars 309 forks source link

cannot create context, error code 111 #1145

Open catkira opened 4 months ago

catkira commented 4 months ago

Sometimes when I do weird things, ie my remote libiio app crashes without closing the network context, I cannot create another context to my plutosdr connected via network.

The error message I get when I do iio_info -u ip:192.168.137.2 is ERROR: Unable to create IIO context ip:192.168.137.2: Connection refused (111)

I have to reboot plutosdr to recover from this.

mhennerich commented 4 months ago

Wondering - can you check if iiod is still running when you get 111? Maybe this is more a plutosdr-fw issue...

catkira commented 4 months ago

Hmm yes, good idea, maybe it's just an iiod crash. I will check when it happens next time.

catkira commented 4 months ago

@mhennerich I checked it, iiod was not running anymore. Does iiod store logs somewhere, or can I start it with command line parameters to enable logging?

catkira commented 4 months ago

when I restart iiod manually with -d and provoke the crash again I can see this output:

New client connected from 192.168.137.1
New client connected from 192.168.137.1
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.

Client exited
Client exited
Client exited
New client connected from 192.168.137.1

New client connected from 192.168.137.1
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
Client exited
DEBUG: Block 0 freed.
Segmentation fault
#
rgetz commented 3 months ago

can you run it (and make it crash) in gdb, so you can get a backtrace?

catkira commented 3 months ago

I think gdb is not included in the pluto buildroot, so I would have to cross-compile it first right?

catkira commented 3 months ago

There was already a gdb package available in buildroot. I added it to my board config and ran it again. I could not provoke the crash anymore. The weird thing is that the crash also does not happen anymore if I run iiod without gdb. It might have been a faulty buildroot build-state, buildroot is quite sensitive if one does not do clean rebuilds all the time. I close this issue for now and reopen it if the crash pops up again.

catkira commented 1 month ago

The crash happened again. I managed to reproduce it while running iiod in gdb. This is the output:

image

(you can ignore the "-- axi_dmac_terminate --", thats a debug printf from kernel)

@rgetz Stack back trace does not show much, because the stack is corrupted

image

@pcercuei do you have any idea?

I am using 7ae483696affb64ab6e1766f55e47f00456593d0 (and I don't see any relevant commits after this that could probably fix this crash)

catkira commented 1 month ago

I think the crash is caused by a call to iio_buffer_cancel()

catkira commented 1 month ago

I used to call the following functions in this order

iio_buffer_cancel(buffer);
iio_channels_mask_destroy(mask);
iio_buffer_destroy(buffer);

the crash does not happen anymore when I change the order to this:

iio_buffer_cancel(buffer);
iio_buffer_destroy(buffer);
iio_channels_mask_destroy(mask);

@pcercuei is this behaviour correct?

catkira commented 1 month ago

The crash also happens if I just call iio_buffer_cancel(buffer); twice after each other. I know it does not make sense to do it, but still it should not crash.

catkira commented 1 month ago

No it does not help, it still sometimes crashes when I call iio_buffer_cancel()

image
catkira commented 1 month ago

@pcercuei I mean this issue <3

catkira commented 1 month ago

I rebuild libiio with LOG_LEVEL=Debug. This is the output that I get before a crash:

image

"-- axi_dmac_terminate_all --" is a debug printf from the kernel driver, that appears every time when I call iio_buffer_cancel(). It is the expected behaviour that this message appears. From the debug output it looks like I do two calls to iio_buffer_cancel() within a short period. When the 2nd calls happens iiod is still freeing stuff and it seems that's when it crashes. An easy scenario to reproduce this (or a similar issue) is to just do two calls to iio_buffer_cancel() right after each other. iiod will crash on the 2nd call.

catkira commented 4 weeks ago

I think the problem is that there are some iio_block_dequeue commands queued that get executed even after blocks are already freed. The iio_block_dequeue() calls are from the buffer-dequeue-thd, but the iio_block_destroy() calls come from the iiod-responder-reader-thd thread. Is it possible that the iio_block_destroy() should be blocked until the buffer-dequeue-thd is finished? @mhennerich @pcercuei do you have a little hint for me? :)

catkira commented 4 weeks ago

I think I have some new information. The crash seems to happen that a call to iio_block_destroy() after iio_buffer_cancel() sometimes causes iiod to crash, because it looks like iio_buffer_cancel() already causes the blocks to be automatically destroyed. It's very weird. The libiio code is a bit unintuitive for me, not something I can fix quickly. I think in the long run, this issue should be fixed. For now I will live with my workaround.

dNechita commented 3 weeks ago

Hi @catkira, We are looking into this. Thank you for providing these details that helps us debug this issue.