Closed fmauch closed 3 years ago
This implements an event-based tcp server for easier use. This could be a solution for #45
In general I think this is a really good solution and I think the event-based tcp is a good approach. I have tested that it works with the ScriptSender, it works very well and it solves #45.
In this commit I only implemented it for the ScriptSender, but it could also be used (probably with modifications) inside the ReverseInterface.
As you mention I also think that it should be used inside the ReverseInterface. I could look into implementing it in the reverse interface if needed?
As you mention I also think that it should be used inside the ReverseInterface. I could look into implementing it in the reverse interface if needed?
I went ahead and implemented it, but you could test it a bit more, if you'd like.
Currently, stopping the ROS driver with CTRL-C results in a RTDE pipeline producer overflow that I still need to hunt down
I went ahead and implemented it, but you could test it a bit more, if you'd like.
I will test that it works with the reverse interface as well.
Currently, stopping the ROS driver with CTRL-C results in a RTDE pipeline producer overflow that I still need to hunt down
It is related to the pipeline running in the RTDEClient stopping the pipeline, when the object is destroyed, should fix that issue. Below example could be a solution:
RTDEClient::~RTDEClient()
{
pipeline_.stop();
}
It is related to the pipeline running in the RTDEClient stopping the pipeline, when the object is destroyed, should fix that issue. Below example could be a solution:
RTDEClient::~RTDEClient() { pipeline_.stop(); }
Yep, that does the trick :-)
@urmahp I'm done with implementing this ATM, please go ahead and test it as much as you can :-)
Before merging, I'd like to also refactor the tcp_client a bit more so it is more understandable and introduce unit tests for the server and client. Let me know, if you'd like to participate in one of these.
@urmahp I'm done with implementing this ATM, please go ahead and test it as much as you can :-)
I will be happy to test it.
Before merging, I'd like to also refactor the tcp_client a bit more so it is more understandable and introduce unit tests for the server and client. Let me know, if you'd like to participate in one of these.
I can work on some unit test for the server and client, while testing it.
I have tested that it works with both the ROS driver and the Isaac driver and overall it is working very well.
However I see one issue once in a while, it is not occuring every time, only once in a while. When stopping the urcap program running on the robot, it seems like the client is not correctly disconnected from the server, because we keep sending data, but there is no client connected. Also the handle_program_state function isn't called. When playing the program again I get the error that the client is already connected, so that also points towards the client not being disconnected correctly. I have inserted the log output below:
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 73: New client connected at FD 12.
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 85: Robot requested program
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 123: Sent program to robot
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 78: Client at FD 12 disconnected.
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/reverse_interface.h 155: Robot connected to reverse interface. Ready to receive control commands.
Program running: true
ERROR /home/mahp/git/github/forks/Universal_Robots_Client_Library/src/comm/tcp_server.cpp 305: Sending data through socket failed.
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/reverse_interface.h 168: Connection to reverse interface dropped.
Program running: false
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 73: New client connected at FD 12.
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 85: Robot requested program
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 123: Sent program to robot
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/script_sender.h 78: Client at FD 12 disconnected.
INFO /home/mahp/git/github/forks/Universal_Robots_Client_Library/include/ur_client_library/comm/reverse_interface.h 155: Robot connected to reverse interface. Ready to receive control commands.
Program running: true
ERROR /home/mahp/git/github/forks/Universal_Robots_Client_Library/src/comm/tcp_server.cpp 267: recv() failed
ERROR /home/mahp/git/github/forks/Universal_Robots_Client_Library/src/comm/tcp_server.cpp 305: Sending data through socket failed.
ERROR /home/mahp/git/github/forks/Universal_Robots_Client_Library/src/comm/tcp_server.cpp 305: Sending data through socket failed.
ERROR /home/mahp/git/github/forks/Universal_Robots_Client_Library/src/comm/tcp_server.cpp 305: Sending data through socket failed.
Unfortunately I haven't been able to track down the issue completely, but I believe it might have something to do with that the disconnect callback functions isn't called, because the server maybe is in a specific state at the point of disconnect. The error message ERROR /home/mahp/git/github/forks/Universal_Robots_Client_Library/src/comm/tcp_server.cpp 267: recv() failed
is only printed once the error occurs.
I'll try to look into it tonight.
Hm, I once hit
[ INFO] [1615925511.466963882]: Client at FD 23 disconnected.
[ INFO] [1615925511.491965322]: Robot connected to reverse interface. Ready to receive control commands.
[ WARN] [1615925514.072904746]: Connection attempt on port 50001 while maximum number of clients (1) is already connected. Closing connection.
[ERROR] [1615925514.225187393]: recv() failed
[ERROR] [1615925514.225742741]: Sending data through socket failed.
[ERROR] [1615925514.233970460]: Sending data through socket failed.
[... more of those]
I finally found a way to reproduce this:
Open questions as for now:
UrServer.readLine()
fails due to ::read
fails here. All this will end [this loop] reconstructing the complete ReverseInterface
. I was quite happy to get rid of this loop, destruct, reset structure... Anyway, in my new implementation we do close the socket in case of a read failure, but we don't call the disconnectCallback here. Therefore, the ReverseInterface
doesn't know it's not connected anymore.ECONNRESET
we can suppress the output in that case.Thank you very much @fmauch for looking into the issue and fixing it.
I apologize that I haven't time to look into the unit test yet, but I can look into it today if needed. What did you have in mind with the unit test? Was it something similar to the test_rtde_client test, testing that a UrDriver object can be created and testing that connection can be established between a robot and the client library or was it to test the tcp_server class you have implemented?
I apologize that I haven't time to look into the unit test yet, but I can look into it today if needed. What did you have in mind with the unit test?
Nothing to apologize. I'd like to test the complete server structure. Do connects / disconnects get handled correctly? Do the correct callbacks get called? Are messages correctly transmitted? Are all error cases handled properly?
Additionally, the current keepalive implementation is not really satisfying. If we connect to the reverse interface, it will be happy as long as we don't disconnect. We should check whether we actually receive a keepalive signal. We should get a keepalive signal (basically any message currently) from the robot before we send the next message. I'd like to implement this using a unit test that should trigger a disconnect event when the keepalive signal is missing. @urmahp would you like to implement this?
I will start developing some unit test to the server now, I will create a pr to your branch once I have some test. I will look into the keepalive implementation as well.
@urmahp I'll have some time to work on this today and tomorrow. If you didn't start with the keepalive implementation, I'll start doing that.
@urmahp I'll have some time to work on this today and tomorrow. If you didn't start with the keepalive implementation, I'll start doing that.
I haven't started the keepalive implementation.
I have added a PR to your branch on some unit test for the tcp server.
I've created a draft implementation of the keepalive handling here. Let's keep discussions on this there.
@urmahp If you could take another review over the PR that would be really nice.
I will test how it is working and I will also look into the sigpipe error which made one of the tcp server test fail on this build.
I will test how it is working and I will also look into the sigpipe error which made one of the tcp server test fail on this build.
I think, I found the error:
When running the test standalone with output in rare cases (I needed to run it up to 2500 times to trigger) I get this failure:
[ RUN ] TCPServerTest.client_connections
/home/mauch/robot_folders/checkout/ur_robot_driver_split/catkin_ws/src/Universal_Robots_Client_Library/tests/test_tcp_server.cpp:319: Failure
Value of: server.write(client1_fd, data, len, written)
Actual: true
Expected: false
I think, that this is the problematic code block:
void TCPServer::handleDisconnect(const int fd)
{
URCL_LOG_DEBUG("%d disconnected.", fd);
close(fd);
if (disconnect_callback_)
{
disconnect_callback_(fd);
}
FD_CLR(fd, &masterfds_);
[...]
}
Calling the disconnect_callback
before closing the socket might lead to a situation where an application thinks that the socket is closed when it is actually not. Therefore, this check returns true, while it should return false. The actual send
command seems to be happily feeding things in and returning success, but this actually results in a SIGPIPE event.
I'll leave the test looping over night and it should not have failed tomorrow morning.
95000 successful tests later I would say that this was indeed the problem.
95000 successful tests later I would say that this was indeed the problem.
Nice thanks for solving the issue.
I have tested that it works, and it is working very well :smiley:
I believe these three variable declaration could be deleted as they aren't used anymore in the UrDriver.
I think that when merging this line should be removed from the ROS driver, else it would log a warning every control cycle, because the ROS driver is still using the script from the resource folder in the ROS driver.
95000 successful tests later I would say that this was indeed the problem.
Nice thanks for solving the issue.
It was me introducing the issue, anyway ;-) Thanks for writing the tests that showed that :-)
I believe these three variable declaration could be deleted as they aren't used anymore in the UrDriver.
Addressed in 587a584
I think that when merging this line should be removed from the ROS driver, else it would log a warning every control cycle, because the ROS driver is still using the script from the resource folder in the ROS driver.
Addressed in https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/pull/346
I cleaned up the branch, from my side this is ready to merge.
I cleaned up the branch, from my side this is ready to merge.
I agree.
This implements an event-based tcp server for easier use. This could be a solution for #45
In this commit I only implemented it for the ScriptSender, but it could also be used (probably with modifications) inside the ReverseInterface.