Closed CJCombrink closed 4 months ago
Is there a sensible way to detect this, and then recover? I have tested with the following and it seems to work, but is it correct or is there a better option?
if(nr_send == 0)
{
tx_stream.reset();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
tx_stream = usrp->get_tx_stream(stream_args);
}
You have a valid issue that I'd love to see get fixed. In my experience, the issue more broadly means there's something going on with networking between the host computer and the X310. That said, if UHD could reset the USRP's networking as you note -- and I don't know if that good code or not -- then streaming might be able to resume. -That- said, check the networking to make sure it is robust: try a direct connection if you're using a switch between the host computer and the USRP; try different cables -- ENET or DAC or fiber; try different adapters if ENET or fiber. Try a different NIC on the host computer, or a different computer with a similar NIC. It's likely that with all of these checks something will come up as not working correctly.
@michael-west @wordimont what do you think of this code change? Is there another way to reset the streaming to allow data to flow again when this issue happens?
I don't know if there's a better way to detect and recover, but I'm not super familiar with what options the API provides. I'm curious if we can reproduce this or if it really is just an unreliable connection like you suggested.
@CJCombrink how quickly does this occur when running tx_waveforms with iperf?
@wordimont It happens immediately after I run iperf.
Any update on this perhaps?
More findings:
If we call get_tx_stream
immediately after send()
returns zero we get the following exception:
Error: EnvironmentError: IOError: Timed out getting recv buff for management transaction
(as per the code in my previous comment)
For it to actually work I need a delay between the time that send()
returns zero and I call the restart code
if(nr_send == 0)
{
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
tx_stream.reset();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
tx_stream = usrp->get_tx_stream(stream_args);
}
(almost anything less than the above 1 seconds sleep results in the exception). Edit: It appears that any one of the two delays shown can be 1second then the reset will work
Running iperf
in the way you are describing it will most likely crash the ZPU (I think). That will shut down your device and the x300 fw poke32 - reply timed out
is then the expected result.
Now I realize that you are obviously not running iperf
in normal operation, but I wonder if you have a network configuration that causes a lot of spurious traffic to slam into the X310. I'm not certain this is what's happening, or what such a network setup would look like, but there may be a difference between your setup and most other people's setup.
I'm closing this, as I don't think there's much we can do here. To go back to the original error:
SSSSU[ERROR] [X300] 192.168.40.2: x300 fw communication failure #1
EnvironmentError: IOError: x300 fw poke32 - reply timed out
This indicates packet loss on the Ethernet interface (SSS). If a claimer packet (communication between X310 firmware and UHD) gets lost, the session is killed and no more streaming is possible. Attempting to fix the session loss would be futile given the connection itself seems compromised.
This is problem with uhd version. This error disappears with UHD 4.7 version
Issue Description
During runtime we sometimes get the following reported in the console:
Afterwards all calls to
tx_stream->send()
times out and no data getting transmitted (the send function returns 0 after 100ms).Setup Details
Expected Behavior
X310 should not stop sending data, or should recover and start sending data again.
Actual Behaviour
The error is reported and sending data stops completely.
Steps to reproduce the problem
The issue can be reproduced using the "tx_waveforms" example and iperf sending data to the device.
--nsamps
is never reached sincetx_stream->send()
returns 0).Additional Information
Using iperf is just a convenient way to reproduce an issue that we see sporadically during "normal" operation.
Edit: After testing it became clear that the
send()
function times out after thetimeout
period expired.