eProsima / Micro-XRCE-DDS-Agent

Micro XRCE-DDS Agent respository. Looking for commercial support? Contact info@eprosima.com
Apache License 2.0
113 stars 81 forks source link

Readers/Writers match, but samples not delivered when using ConnextDDS #307

Open spiderkeys opened 2 years ago

spiderkeys commented 2 years ago

When subscribing to topics published by the XRCE Agent from RTI Connext DDS 6.1.0 based applications, discovery appears to complete, and readers/writers match according to both QOS and the topic's datatype, but samples are not delivered to the Connext DataReader over any available transport.

Samples are successfully delivered when subscribing from FastDDS or CycloneDDS.

Steps to reproduce the issue

  1. Build the XRCE Client PublishHelloWorld example: https://github.com/eProsima/Micro-XRCE-DDS-Client/tree/master/examples/PublishHelloWorld

  2. Build the XRCE Agent

  3. Run the agent in UDP mode:

    ./MicroXRCEAgent udp4 --port 2048
  4. Run the client in UDP mode:

    ./PublishHelloWorldClient "127.0.0.1" 2048
  5. Confirm that the agent discovers the client and the topic and datawriter entities are created on the client's behalf

    spiderkeys@spiderdesk:~/z/workspace/Micro-XRCE-DDS-Agent/build$ ./MicroXRCEAgent udp4 --port 2048
    [1647303594.428805] info     | UDPv4AgentLinux.cpp | init                     | running...             | port: 2048
    [1647303594.428952] info     | Root.cpp           | set_verbose_level        | logger setup           | verbose_level: 4
    [1647303595.980317] info     | Root.cpp           | create_client            | create                 | client_key: 0xAAAABBBB, session_id: 0x81
    [1647303595.980355] info     | SessionManager.hpp | establish_session        | session established    | client_key: 0xAAAABBBB, address: 127.0.0.1:48034
    [1647303595.982796] info     | ProxyClient.cpp    | create_participant       | participant created    | client_key: 0xAAAABBBB, participant_id: 0x001(1)
    [1647303595.982986] info     | ProxyClient.cpp    | create_topic             | topic created          | client_key: 0xAAAABBBB, topic_id: 0x001(2), participant_id: 0x001(1)
    [1647303595.983066] info     | ProxyClient.cpp    | create_publisher         | publisher created      | client_key: 0xAAAABBBB, publisher_id: 0x001(3), participant_id: 0x001(1)
    [1647303595.984776] info     | ProxyClient.cpp    | create_datawriter        | datawriter created     | client_key: 0xAAAABBBB, datawriter_id: 0x001(5), publisher_id: 0x001(3)
  6. Use rtiddsgen to generate typesupport source files from the same HelloWorld.idl file in the XRCE Client project

  7. Create a simple application using Connext DDS that subscribes to the HelloWorld topic published by the XRCE client:

    auto dr_qos = ::dds::core::QosProvider::Default()->datareader_qos( dds::qos::BEST_EFFORT );
    _dr_helloworld = _node->create_reader<HelloWorld>( "HelloWorldTopic", dr_qos );
    _rc_helloworld = { _dr_helloworld, ::dds::sub::status::DataState::any(), [&]()
    {
        auto samples = _dr_helloworld.take();
        for(auto& sample : samples) {
            spdlog::info( "Got sample!" );
            if (sample.info().valid()){
                spdlog::info( "Got valid sample!" );
            }
        }
    }};
    
    // Wait for samples
    _waitset += _rc_helloworld;
    while( true ) {
        _waitset.dispatch( dds::core::Duration::from_millisecs( 5 ) );
    } 

    You can also just subscribe directly in RTI AdminConsole - the same observations apply.

  8. Confirm in RTI AdminConsole that the datawriter and datareader have matched on both type consistency and QOS Screenshot from 2022-03-14 17-35-57

Additional information

Both applications seem to agree on XCDR vs XCDR2.

The only difference I can see right now is that ConnextDDS 6.1.0 is using RTPS 2.3 whereas XRCE Agent (FastDDS) is using v2.2 and Cyclone is using v2.1.

Additionally, I enabled the least strict TypeConsistency checks on the Connext side, in case those were a problem:

        <qos_profile name="RelaxedTypeConsistency" base_name="TestLibrary::Default">
             <datareader_qos>
                <type_consistency>
                    <kind>
                        ALLOW_TYPE_COERCION
                    </kind>
                    <ignore_member_names>
                        true
                    </ignore_member_names>
                    <force_type_validation>
                        false
                    </force_type_validation>
                </type_consistency>
             </datareader_qos>
        </qos_profile>
pablogs9 commented 2 years ago

Hello @spiderkeys

I guess that it is an error, but PublishHelloWorldClient won't match with the code you have shared since it does not have the same topic name, type name, or reliability QoS.

I assume that you have prepared a Connext DDS application that results in the output that you share in the screenshot of the RTI AdminConsole. That is that matches with PublishHelloWorldClient.

Regarding the RTPS version, I have been talking with @MiguelCompany, (responsible in eProsima of Fast DDS) and there should be no problem here.

We have been analyzing the situation and it seems to be a serialization issue. Could you share the type of support generated by Micro XRCE-DDS Gen and the one generated by RTI Connext in order to ensure that the serialization is compatible?

spiderkeys commented 2 years ago

Sorry, I had copied the wrong test code w.r.t. the HelloWorld (the previous code was from earlier testing I was doing on the PX4 bridged messages). I've updated that code block, which is what the screenshot from AdminConsole is showing the match on.

Attached are the typesupport files for Connext, FastRTPSGen, and Cyclone.

Worth noting that the fastrtpsgen version is 1.0.4 (an earlier specification from the existing microRTPS bridge that PX4 is using). Let me know if this needs to get bumped to latest fastddsgen to work properly.

typesupport.tar.gz

pablogs9 commented 2 years ago

Could you also add the Micro XRCE-DDS type support?

spiderkeys commented 2 years ago

Can you clarify on this - I wasn't aware that there was separate typesupport necessary when using the XRCE client/agent. Is this not true? The HelloWorld client example constains the ucdr serialization and uses xml entity creation, and then the Agent picks up the serialized messages and republishes them to applications that have full type descriptions (which works in the case of FastDDS and Cyclone).

XRCE Client code, just for reference: https://github.com/eProsima/Micro-XRCE-DDS-Client/tree/develop/examples/PublishHelloWorld

I believe the HelloWorld.h/c are what would be considered the typesupport.

pablogs9 commented 2 years ago

Because of https://github.com/PX4/PX4-Autopilot/pull/19326 I thought that you were generating another type of support for the XRCE-DDS side.

Can you check that the RTI Connext side is ensuring XCDR and not XCDR2 in order to guarantee that we are not having serialization and alignment issues?

spiderkeys commented 2 years ago

I can confirm that the Connext side is using XCDR. The default QOS value in 6.x for DataRepresentationQosPolicy is AUTO_DATA_REPRESENTATION, which translates to XCDR by default (unless some other specific opt-in criteria are met, which are not being used). This is also reflected in the screenshot from Admin Console above, where requested and offered DataRepresentation both reflect XCDR.

It is worth noting that in wireshark, only metatraffic is shown ever having been sent to Connext applications - no samples are ever transmitted from the XRCE agent.

Upon subscribing to the HelloWorld topic, I can see that XRCE agent sends a few HEARTBEATs, to which Connext replies with ACKNACKs. Upon unsubscribing, Connext sends DATA submessages marking the given key Unregistered/Disposed.

I can share a pcap of the RTPS traffic, if it helps.

pablogs9 commented 2 years ago

Yes please, share a PCAP of the discovery process.

spiderkeys commented 2 years ago

Attached as zip, due to github file restrictions.

The sequence of events are:

  1. Before capture, XRCE Agent is running with HelloWorld client already connected and publishing to it. Admin Console is closed.
  2. Capture is started.
  3. Admin Console is opened, discovery of participants and resources occurs.
  4. Admin Console subscribes to the HelloWorldTopic topic. Its view of the type information is: Screenshot from 2022-03-16 01-11-57
  5. A few seconds later, I unsubscribe from the topic (Unregister/Dispose seen in wireshark)
  6. A few seconds later, I subscribe again (Heartbeats from Agent, AckNack from Admin Console)
  7. A few seconds later, I unsubscribe again (Unregister/Dispose again)
  8. Admin console is closed
  9. Capture is ended xrce_connext_discovery.zip
spiderkeys commented 2 years ago

Just to make absolutely sure about the XCDR vs XCDR2 situation, I explicitly set the DataRepresentation QOS to XCDR_DATA_REPRESENTATION, as opposed to leaving it default. No change.

pablogs9 commented 2 years ago

We found something. Could you please try to create the subscription in RTI AdminConsole removing the partitions?

I have seen here that there is a text field with the text {empty},*, try the same scenario but with this field empty.

pablogs9 commented 2 years ago

Possible fix here: https://github.com/eProsima/Fast-DDS/pull/2580

spiderkeys commented 2 years ago

I can confirm that removing the { "", "*" } entries results in receiving samples in Admin Console. I'll test the fix on the Fast-DDS side later today. Thanks!

pablogs9 commented 2 years ago

We'll update Fast DDS version in the Micro XRCE-DDS Agent as soon as the patch is merged.