eProsima / Fast-DDS

The most complete DDS - Proven: Plenty of success cases. Looking for commercial support? Contact info@eprosima.com
https://eprosima.com
Apache License 2.0
2.16k stars 765 forks source link

Stuck in mp_subscriber->create_datareader() when DynamicType discover with TCP #3138

Closed ulongcha closed 1 year ago

ulongcha commented 1 year ago

Is there an already existing issue for this?

Expected behavior

Successfully create Datareader when calling on_type_discovery() in DynamicType discover with TCP.

Current behavior

It seems like it stucks in reader->enable() in the create_datareader() when calling on_type_discovery() without throwing error or nullptr. Maybe there's something I need to set, please help.

Steps to reproduce

my subscriber's setting is: DomainParticipantFactoryQos factory_qos; factory_qos.entity_factory().autoenable_created_entities = false; DomainParticipantFactory::get_instance()->set_qos(factory_qos); DomainParticipantQos Subpqos; Subpqos.transport().use_builtin_transports = false; std::shared_ptr<TCPv4TransportDescriptor> descriptor = std::make_shared<TCPv4TransportDescriptor>(); descriptor->add_listener_port(5100); Subpqos.transport().user_transports.push_back(descriptor);

my publisher's setting is: DomainParticipantQos Agentpqos; Agentpqos.name("AgentParticipant_pub"); Agentpqos.transport().use_builtin_transports = false; int32_t kind = LOCATOR_KIND_TCPv4; Locator initial_peer_locator; initial_peer_locator.kind = kind; std::shared_ptr<TCPv4TransportDescriptor> descriptor = std::make_shared<TCPv4TransportDescriptor>(); IPLocator::setIPv4(initial_peer_locator, "172.16.10.133"); //subscriber's virtual machine's IP initial_peer_locator.port = 5100; Agentpqos.wire_protocol().builtin.initialPeersList.push_back(initial_peer_locator); Agentpqos.transport().user_transports.push_back(descriptor); mp_participant = DomainParticipantFactory::get_instance()->create_participant(0, Agentpqos);

Fast DDS version/commit

2.8.1&2.9.0

Platform/Architecture

Ubuntu Focal 20.04 amd64

Transport layer

TCPv4

Additional context

No response

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

ulongcha commented 1 year ago

I print log found that subscriber can not receive RESPONSE from publisher while publisher has sent.

log of subscriber Discovered type: TYf2fbdba1320063b from topic Tof2fbdba1320063b 2022-12-09 14:04:54.295 [PARTICIPANT Info] Type TYf2fbdba1320063b registered. -> Function register_type Register_type 2022-12-09 14:04:54.296 [SUBSCRIBER Info] CREATING SUBSCRIBER IN TOPIC: Tof2fbdba1320063b -> Function create_datareader 2022-12-09 14:04:54.299 [RTPS_READER Info] RTPSReader created correctly -> Function init 2022-12-09 14:04:54.999 [RTPS_WRITER Info] Sending relevant changes as DATA/DATA_FRAG messages -> Function add_data 2022-12-09 14:04:54.999 [RTPS_WRITER Info] Sending INFO_TS message -> Function add_info_ts_in_buffer 2022-12-09 14:04:54.999 [RTCP_MSG Info] Send [OPEN_LOGICAL_PORT_REQUEST] LogicalPort: 7410 -> Function sendOpenLogicalPortRequest 2022-12-09 14:04:54.999 [RTCP_MSG Info] Send [OPEN_LOGICAL_PORT_REQUEST] LogicalPort: 7414 -> Function sendOpenLogicalPortRequest 2022-12-09 14:04:54.999 [RTCP_MSG Info] Send [OPEN_LOGICAL_PORT_REQUEST] LogicalPort: 7416 -> Function sendOpenLogicalPortRequest 2022-12-09 14:04:57.999 [RTPS_WRITER Info] Sending relevant changes as DATA/DATA_FRAG messages -> Function add_data 2022-12-09 14:04:57.999 [RTPS_WRITER Info] Sending INFO_TS message -> Function add_info_ts_in_buffer

log of publisher 2022-12-09 14:04:54.975 [RTCP_MSG_IN Info] Received RTCP MSG. Logical Port 0 -> Function Receive 2022-12-09 14:04:54.975 [RTCP_MSG Info] Receive [OPEN_LOGICAL_PORT_REQUEST] LogicalPort: 7410 -> Function processRTCPMessage 2022-12-09 14:04:54.975 [RTCP_MSG Info] Send [OPEN_LOGICAL_PORT_RESPONSE] Not found: 7410 -> Function processOpenLogicalPortRequest 2022-12-09 14:04:54.975 [RTCP_MSG_IN Info] Received RTCP MSG. Logical Port 7412 -> Function Receive 2022-12-09 14:04:54.975 [RTCP_MSG_IN Info] [RECEIVE] From: UDPv4:[0.0.0.0]:485752832 - 252 bytes. -> Function Receive 2022-12-09 14:04:54.975 [RTPS_MSG_IN Info] (ID:140644592899840) InfoTS Submsg received, processing... -> Function processCDRMsg 2022-12-09 14:04:54.975 [RTPS_MSG_IN Info] (ID:140644592899840) Data Submsg received, processing. -> Function processCDRMsg 2022-12-09 14:04:54.975 [RTPS_MSG_IN Info] (ID:140644592899840) from Writer 01.0f.71.10.a2.be.d0.81.01.00.00.00|0.1.0.c2; possible RTPSReader entities: 4 -> Function proc_Submsg_Data 2022-12-09 14:04:54.975 [RTPS_MSG_IN Info] (ID:140644592899840) Trying to add change 1 TO reader: 01.0f.99.c9.bd.93.35.d1.01.00.00.00|0.1.0.c7 -> Function processDataMsg 2022-12-09 14:04:54.976 [RTPS_MSG_IN Info] (ID:140644592899840) Sub Message DATA processed -> Function proc_Submsg_Data 2022-12-09 14:04:54.976 [RTCP_MSG_IN Info] Received RTCP MSG. Logical Port 0 -> Function Receive 2022-12-09 14:04:54.976 [RTCP_MSG Info] Receive [OPEN_LOGICAL_PORT_REQUEST] LogicalPort: 7414 -> Function processRTCPMessage 2022-12-09 14:04:54.976 [RTCP_MSG Info] Send [OPEN_LOGICAL_PORT_RESPONSE] Not found: 7414 -> Function processOpenLogicalPortRequest 2022-12-09 14:04:54.976 [RTCP_MSG_IN Info] Received RTCP MSG. Logical Port 0 -> Function Receive 2022-12-09 14:04:54.976 [RTCP_MSG Info] Receive [OPEN_LOGICAL_PORT_REQUEST] LogicalPort: 7416 -> Function processRTCPMessage 2022-12-09 14:04:54.976 [RTCP_MSG Info] Send [OPEN_LOGICAL_PORT_RESPONSE] Not found: 7416 -> Function processOpenLogicalPortRequest

Mario-DL commented 1 year ago

Hi @ulongcha,

Thank you for the detailed description. Could you please provide a minimal reproducer for the issue ?

ulongcha commented 1 year ago

@Mario-DL Sorry for delay reply, here is a minimal reproducer for my tcp+Dynamic type. Thank you! tcp_minium.zip

ulongcha commented 1 year ago

Hi @ulongcha,

Thank you for the detailed description. Could you please provide a minimal reproducer for the issue ?

Hi @Mario-DL, I have tried several methods still can not resolve this issue. I first created a participant for discovery and saved the dyntype, then used this dyntype to create a new participant for matching still failed. Then I configure usertraffic and metatraffic in participant qos, still not work. Can you give me some suggestion?

ulongcha commented 1 year ago

I debug and find when I create reader in on_type_discover() function, it run to MessageReceiver::associateEndpoint then jump out to my main() function, don't know why.

void MessageReceiver::associateEndpoint(
        Endpoint* to_add)
{
    std::lock_guard<eprosima::shared_mutex> guard(mtx_); // Here jump out to main()
    if (to_add->getAttributes().endpointKind == WRITER)
    {
        ...
Mario-DL commented 1 year ago

Hi @ulongcha,

Sorry for the delayed response and thanks for the detailed information provided.

The issue has to do with the place where create_datareader() is performed. We do not recommend to create these kind of entities within callbacks (on_type_discovery() in our case) as it may result in an internal deadlock.

Please, find attached a modified version of your reproducer code with a quick fix applied. This is, creating datareader upon type's discovery, in the main thread. I tried it in the master branch (in which the issue was also reproduced) but it should be okay for 2.8.1 and 2.9.0.

We will also modify a couple of examples that may lead to confusion.

tcp_minimum_fix.zip

ulongcha commented 1 year ago

Thank you very much!