eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
798 stars 349 forks source link

tcp transport coredump #2010

Open yanxiaochun1005 opened 1 month ago

yanxiaochun1005 commented 1 month ago

When multiple topics publish data in a domain by TCP, a coredump occurs if the TCP connection is disconnected。 2024-05-24 10:31:45.893 [0] recv: thread_cputime 0.000000000 2024-05-24 10:31:45.893 [0] dq.builtins: data(builtin, vendor 1.16): 0:0:0:0 #1: ST0 /ParticipantBuiltinTopicData:{property_list={1:"__ProcessName":"dds_test",1:"Pid":"16809",1:"Hostname":"user-HP"}:{},protocol_version=2:1,vendorid=1:16,participant_lease_duration=10000000000,participant_guid={110e2b8:b421f0d4:6efb72b8:1c1},builtin_endpoint_set=64575,domain_id=0,default_unicast_locator={tcp/192.168.30.75:4000},metatraffic_unicast_locator={tcp/192.168.30.75:4000},adlink_participant_version_info=0:44:0:0:0:"user-HP/0.9.1/Linux/Linux"} 2024-05-24 10:31:45.893 [0] dq.builtins: SPDP ST0 110e2b8:b421f0d4:6efb72b8:1c1 (known) L(:1c1 276.236983) 2024-05-24 10:31:46.070 [0] dds_test: tcp abandoning write on blocking socket 8 after 0 bytes 2024-05-24 10:31:46.070 [0] dds_test: tcp cache removed socket 8 to tcp/192.168.30.75:4000 2024-05-24 10:31:46.070 [0] dds_test: tcp close client connection on socket 8 to tcp/192.168.30.75:4000 2024-05-24 10:31:46.070 [0] dds_test: nn_xpack_send 64104: 0xffff78002d8c:20 0xffff78002da0:8 0xffff78be3b98:48 0xffff780061d0:64000 0xffff78be3d18:28 [ tcp/192.168.30.75:4000 ] 2024-05-24 10:31:46.070 [0] dds_test: nn_xpack_send 64104: 0xffff6c002acc:20 0xffff6c002ae0:8 0xffff6c0028d8:48 0xffff6c006650:64000 0xffff6c003608:28 [ tcp/192.168.30.75:4000tcp blocked write: sock 8 2024-05-24 10:31:46.070 [0] recv: tcp free client connection on socket 8 to tcp/192.168.30.75:4000 2024-05-24 10:31:46.070 [0] dds_test: ] 2024-05-24 10:31:46.070 [0] recv: tcp connection free socket 8 2024-05-24 10:31:46.070 [0] dds_test: tcp write: sock 8 error -12 2024-05-24 10:31:46.070 [0] dds_test: traffic-xmit (1) 64104 2024-05-24 10:31:46.070 [0] dds_test: ] 2024-05-24 10:31:46.070 [0] dds_test: traffic-xmit (1) 64104 2024-05-24 10:31:46.070 [0] dds_test: traffic-xmit (1) 64104 2024-05-24 10:31:46.070 [0] tev: xpack_addmsg 0xffff5c000b60 0xffff4c0062e0 0(control): niov 3 sz 60 => now niov 4 sz 92

yanxiaochun1005 commented 1 month ago

When multiple topics publish data in a domain by TCP, a coredump occurs if the TCP connection is disconnected。 2024-05-24 10:31:45.893 [0] recv: thread_cputime 0.000000000 2024-05-24 10:31:45.893 [0] dq.builtins: data(builtin, vendor 1.16): 0:0:0:0 #1: ST0 /ParticipantBuiltinTopicData:{property_list={1:"__ProcessName":"dds_test",1:"Pid":"16809",1:"Hostname":"user-HP"}:{},protocol_version=2:1,vendorid=1:16,participant_lease_duration=10000000000,participant_guid={110e2b8:b421f0d4:6efb72b8:1c1},builtin_endpoint_set=64575,domain_id=0,default_unicast_locator={tcp/192.168.30.75:4000},metatraffic_unicast_locator={tcp/192.168.30.75:4000},adlink_participant_version_info=0:44:0:0:0:"user-HP/0.9.1/Linux/Linux"} 2024-05-24 10:31:45.893 [0] dq.builtins: SPDP ST0 110e2b8:b421f0d4:6efb72b8:1c1 (known) L(:1c1 276.236983) 2024-05-24 10:31:46.070 [0] dds_test: tcp abandoning write on blocking socket 8 after 0 bytes 2024-05-24 10:31:46.070 [0] dds_test: tcp cache removed socket 8 to tcp/192.168.30.75:4000 2024-05-24 10:31:46.070 [0] dds_test: tcp close client connection on socket 8 to tcp/192.168.30.75:4000 2024-05-24 10:31:46.070 [0] dds_test: nn_xpack_send 64104: 0xffff78002d8c:20 0xffff78002da0:8 0xffff78be3b98:48 0xffff780061d0:64000 0xffff78be3d18:28 [ tcp/192.168.30.75:4000 ] 2024-05-24 10:31:46.070 [0] dds_test: nn_xpack_send 64104: 0xffff6c002acc:20 0xffff6c002ae0:8 0xffff6c0028d8:48 0xffff6c006650:64000 0xffff6c003608:28 [ tcp/192.168.30.75:4000tcp blocked write: sock 8 2024-05-24 10:31:46.070 [0] recv: tcp free client connection on socket 8 to tcp/192.168.30.75:4000 2024-05-24 10:31:46.070 [0] dds_test: ] 2024-05-24 10:31:46.070 [0] recv: tcp connection free socket 8 2024-05-24 10:31:46.070 [0] dds_test: tcp write: sock 8 error -12 2024-05-24 10:31:46.070 [0] dds_test: traffic-xmit (1) 64104 2024-05-24 10:31:46.070 [0] dds_test: ] 2024-05-24 10:31:46.070 [0] dds_test: traffic-xmit (1) 64104 2024-05-24 10:31:46.070 [0] dds_test: traffic-xmit (1) 64104 2024-05-24 10:31:46.070 [0] tev: xpack_addmsg 0xffff5c000b60 0xffff4c0062e0 0(control): niov 3 sz 60 => now niov 4 sz 92

and Multiple topics publish data at the same time on multiple threads

yanxiaochun1005 commented 1 month ago

@eboasson

eboasson commented 1 month ago

Hmm ...

I knew the TCP support wasn't great, but this is bad.

With a bit of luck it is a really stupid mistake and a quick look at a stack trace suffices. Would you be able to make it crash, then do thread apply all bt in gdb? It might save me from having to try to reproduce it first. Thanks!

yanxiaochun1005 commented 2 weeks ago

@eboasson ![Uploading img_v3_02b5_b50c8bde-7277-4f4a-8146-9f8400e827fg.jpg…]()