1r0b1n0 / rosserial_qt

Qt implementation for a rosserial client
5 stars 2 forks source link

Rosserial-server gets out of sync #1

Closed julled closed 6 years ago

julled commented 6 years ago

Hey, thank you for your code ! I tried to make a simple test by sending a string message from a windows machine (like you do in your test code), but i am running into an issue at the rosserial-server side (https://github.com/1r0b1n0/rosserial/). I started the server on the linuxpc via the 32bit launchfile with 32bit msg-length and tcp_nodelay = false. The server seems to run out of sync with the client ("Sync with device lost."). Below are the first debug messages the server produces after connecting to the rosserial_qt client:

[DEBUG] [1511961229.775179496]: Starting session. [DEBUG] [1511961229.775294637]: Sending request topics message for VER2 protocol. [DEBUG] [1511961229.775368040]: Sending buffer of 10 bytes to client. [DEBUG] [1511961229.775623194]: Buffer read of 1 bytes, wi: 0, ri: 0 [DEBUG] [1511961229.775683364]: Requesting transfer of at least 1 byte(s). [DEBUG] [1511961232.775572261]: Sync with device lost. [DEBUG] [1511961232.775758171]: Read operation failed with: system:125 [DEBUG] [1511961232.802206761]: Starting session. [DEBUG] [1511961232.802324010]: Sending request topics message for VER2 protocol. [DEBUG] [1511961232.802372657]: Sending buffer of 10 bytes to client. [DEBUG] [1511961232.802451587]: Buffer read of 1 bytes, wi: 0, ri: 0 [DEBUG] [1511961232.802498709]: Requesting transfer of at least 1 byte(s). [DEBUG] [1511961232.805563627]: Successfully read 100 byte(s), now 100 available. [DEBUG] [1511961232.805623780]: Invoking success callback with buffer of requested size 1 byte(s). [DEBUG] [1511961232.805671109]: Buffer read of 1 bytes, wi: 100, ri: 1 [DEBUG] [1511961232.805698096]: Invoking success callback with buffer of requested size 1 byte(s). [DEBUG] [1511961232.805725293]: Buffer read of 7 bytes, wi: 100, ri: 2 [DEBUG] [1511961232.805748633]: Invoking success callback with buffer of requested size 7 byte(s). [DEBUG] [1511961232.805776107]: Received message header with length 8 and topic_id=10 [DEBUG] [1511961232.805798532]: Buffer read of 9 bytes, wi: 100, ri: 9 [DEBUG] [1511961232.805840928]: Invoking success callback with buffer of requested size 9 byte(s). [DEBUG] [1511961232.805873039]: Received body of length 9 for message on topic 10. [DEBUG] [1511961232.805910873]: Sending buffer of 18 bytes to client. [DEBUG] [1511961232.805960935]: Buffer read of 1 bytes, wi: 100, ri: 18 [DEBUG] [1511961232.805996750]: Invoking success callback with buffer of requested size 1 byte(s). [DEBUG] [1511961232.806028195]: Buffer read of 1 bytes, wi: 100, ri: 19 [DEBUG] [1511961232.806048770]: Invoking success callback with buffer of requested size 1 byte(s). [DEBUG] [1511961232.806071688]: Buffer read of 7 bytes, wi: 100, ri: 20 [DEBUG] [1511961232.806245961]: Invoking success callback with buffer of requested size 7 byte(s). [DEBUG] [1511961232.806280251]: Received message header with length 72 and topic_id=0 [DEBUG] [1511961232.806314237]: Buffer read of 73 bytes, wi: 100, ri: 27 [DEBUG] [1511961232.806343832]: Invoking success callback with buffer of requested size 73 byte(s). [DEBUG] [1511961232.806378143]: Buffer is empty, resetting indexes to the beginning. [DEBUG] [1511961232.806409607]: Received body of length 73 for message on topic 0. [DEBUG] [1511961232.808568573]: connect() succeeded to [192.168.167.80:39185] on socket [15] [DEBUG] [1511961232.809156387]: TCP socket [15] closed [DEBUG] [1511961232.810388395]: Adding tcp socket [15] to pollset [DEBUG] [1511961232.810596518]: Async connect() in progress to [192.168.167.80:39185] on socket [15]

[DEBUG] [1511961232.812201599]: Service client from [callerid=[/rosserial_message_info] address=[TCPROS connection on port 52708 to [192.168.167.80:39185 on socket 15]]] for [/message_info] dropped [DEBUG] [1511961232.812273476]: TCP socket [15] closed

[DEBUG] [1511961232.813590819]: Buffer read of 1 bytes, wi: 0, ri: 0 [DEBUG] [1511961232.813701457]: Requesting transfer of at least 1 byte(s). [DEBUG] [1511961232.870545567]: Accepted connection on socket [10], new socket [15] [DEBUG] [1511961232.870828126]: Adding tcp socket [15] to pollset [DEBUG] [1511961232.870881637]: TCPROS received a connection from [192.168.167.80:33634] [DEBUG] [1511961232.870983856]: Connection: Creating TransportSubscriberLink for topic [/chatter] connected to [callerid=[/rostopic_19422_1511879097846] address=[TCPROS connection on port 38125 to [192.168.167.80:33634 on socket 15]]] [DEBUG] [1511961237.813673151]: Sync with device lost. [DEBUG] [1511961237.813784589]: Publisher on '/chatter' deregistering callbacks.

[DEBUG] [1511961237.815930645]: Connection to subscriber [callerid=[/rostopic_19422_1511879097846] address=[TCPROS connection on port 38125 to [192.168.167.80:33634 on socket 15]]] to topic [/chatter] dropped [DEBUG] [1511961237.816009299]: TCP socket [15] closed

Do you have any idea how to resolve this issue?

1r0b1n0 commented 6 years ago

Hi, On the rosserial_server side everything seems ok, but it drops the connection after a 5sec timeout. It seems the node_handle timer on the client side isn't trigged (so spinOnce() isn't called in time) ; that could happen if the Qt event loop's thread is blocked (for example by a call to sleep() or another blocking method). Could you add some debug logs to int ros::NodeHandle::spinOnce() in node_handle.cpp and check it is called at least 10 times a second ?

julled commented 6 years ago

it looks okey so far

timer elaped:108milliseks timer elaped:112milliseks timer elaped:109milliseks timer elaped:107milliseks timer elaped:112milliseks timer elaped:106milliseks timer elaped:109milliseks timer elaped:111milliseks timer elaped:109milliseks timer elaped:108milliseks timer elaped:110milliseks timer elaped:108milliseks timer elaped:112milliseks timer elaped:114milliseks timer elaped:102milliseks timer elaped:110milliseks timer elaped:112milliseks timer elaped:106milliseks timer elaped:109milliseks timer elaped:109milliseks timer elaped:110milliseks timer elaped:111milliseks timer elaped:109milliseks timer elaped:108milliseks timer elaped:113milliseks

measured with qelapsedtimer at the entry of spinOnce(). or does it need to be below 100ms?

1r0b1n0 commented 6 years ago

Indeed the timer is called correctly (it just need to be below the timeout of the server) Could you maybe show your test code ?

julled commented 6 years ago

As i want to send a gridmap, the functionnames are related to that... header

#include "ros/ros.h"
#include "ros/publisher.h"
#include <std_msgs/String.h>
class Rosconvertgrid{
private:
    void callbackRaw();
    ros::Publisher< std_msgs::String > chatter;
    ros::NodeHandle nh;
    bool    m_ROSInit;

public:
    Rosconvertgrid();

};

cpp

#include "pch.h"
#include "Rosconvertgrid.h"

Rosconvertgrid::Rosconvertgrid() :
    chatter("chatter"),
    m_ROSInit(false){}

void Rosconvertgrid::callbackRaw() {
    if (!m_ROSInit) 
    {
        std::string ros_master = "192.168.167.80";
        nh.open(ros_master.c_str());
        nh.advertise(chatter);
        m_ROSInit = true; 
    }

    static int i = 1;
    i++;
    std_msgs::String str_msg;
    str_msg.data = QString("Hello %1").arg(i).toStdString();
    chatter.publish(str_msg);
}

Rosconvertgrid::callbackRaw() gets called periodically (10 times a sec) by external function. If i look what happens on wireshark on the windows-pc(rosserial_qt), it looks like rosserial_qt doesnt send data, even there is a established tcp-connection. The 3 sek instead of 1 sek delay to disconnect (FIN-Pakets) is due i changed the attempt time in session.h ... wireshark

1r0b1n0 commented 6 years ago

I tried your example and it is working fine on my computers (i have no disconnections and receive all the messages with rostopic echo /chatter), I added the code below so you can try on your system.

Is your callback called from the same thread as Qt ? The publish() method is not thread-safe. (if it is not you could use QMetaObject::invokeMethod with a Qt::QueuedConnection)

Windows side :

#include <QCoreApplication>
#include <Qtimer>
#include "ros/ros.h"
#include "ros/publisher.h"
#include "std_msgs/String.h"

using std::string;

class Rosconvertgrid{
public:
    Rosconvertgrid():
        chatter("chatter"),
        m_ROSInit(false){}

    void callbackRaw(){
        if (!m_ROSInit)
        {
            std::string ros_master = "192.168.0.56";
            nh.open(ros_master.c_str());
            nh.advertise(chatter);
            m_ROSInit = true;
        }

        static int i = 1;
        i++;
        std_msgs::String str_msg;
        str_msg.data = QString("Hello %1").arg(i).toStdString();
        chatter.publish(str_msg);
    }

private:
    ros::Publisher< std_msgs::String > chatter;
    ros::NodeHandle nh;
    bool    m_ROSInit;
};

int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);
    Rosconvertgrid *rosconvertgrid = new Rosconvertgrid();

    QTimer *timer = new QTimer();
    timer->connect(
        timer, &QTimer::timeout,
        [rosconvertgrid]() { rosconvertgrid->callbackRaw(); }
    );
    timer->start(1000);

    return app.exec();
}

Server launch :

<launch>
  <node pkg="rosserial_server" type="socket_node" name="rosserial_server">
    <param name="msg_length_bytes" value="4"/>
    <param name="buffer_size" value="1000000"/>
    <param name="tcp_nodelay" value="false"/>
  </node>
  <node pkg="rosserial_python" type="message_info_service.py" name="rosserial_message_info" />
</launch>
julled commented 6 years ago

Thanks for the basic example. It looks like the tcp pakets gets blocked after sending via the qt tcpssocket. I am running out of ideas on what might be the problem, as i observed this problem with different errors on two different windows machines. I was switching over to a udpsocket and it worked like a charm. Maybe its better for my usecase as its faster anyway.

julled commented 6 years ago

It turned out UDP wasnt a solution for us, as the message length of a UDP packet is 16 bit maximum. Thus we needed to get back to TCP, as it automatically splits the data in multiple packages. To get it working at our project, we needed to proceed as you proposed. We needed to connect node_handle with RosQTSocket by using the SignalSlot System to get it running. We emit a Signal with the constructed message in node_handle for a slot at the write function in RosQTSocket. Thank you for all your help!