eclipse-ecal / ecal

📦 eCAL - enhanced Communication Abstraction Layer. A high performance publish-subscribe, client-server cross-plattform middleware.
https://ecal.io
Apache License 2.0
842 stars 174 forks source link

data not update when proto msg all zero in 5.11.5. but 5.11.4 is ok. #1200

Closed learnonroad closed 1 year ago

learnonroad commented 1 year ago

Problem Description

I encountered problems when using proto to publish and subscribe. The problem occurred when I updated eCAL from 5.11.4 to 5.11.5. I use protobuf to post the data. When my protobuf msg publishes normal data, it works normally, but when I set all msg to 0, the subscription callback can respond normally, but the data is still the previous data. I checked the changelog and didn't find the problem. Is it because after 5.11.5, if the data is 0, the data will not be updated?How to fix it?

How to reproduce

Here is the simplye source code:

void cmdCallback(const SmileSMower::MoveCmd &c) {
  std::cout << "receive: " << steady_clock::now().time_since_epoch() << " " << c.linear() << std::endl;
  std::this_thread::sleep_for(std::chrono::milliseconds(100));
}

int main(int argc, char **argv) {

  eCAL::Initialize(argc, argv, "monitor ecal_sub test");
  eCAL::Process::SetState(proc_sev_healthy, proc_sev_level1, "I feel good !");
  eCAL::Util::EnableLoopback(true);

  SmileSMower::MoveCmd command;
  eCAL::protobuf::CPublisher<SmileSMower::MoveCmd> cmd_publisher("/cmd/test");
  eCAL::protobuf::CSubscriber<SmileSMower::MoveCmd> cmd_subscriber("/cmd/test");
  cmd_subscriber.AddReceiveCallback(std::bind(cmdCallback, std::placeholders::_2));

  int cnt = 0;
  while (cnt < 20) {
    cnt++;
    std::this_thread::sleep_for(std::chrono::milliseconds(50));
    command.set_linear(cnt);
    if (cnt >= 10) {
     //All data will be cleared after 500ms
      command.set_linear(0);
      std::cout << "pub zero" << std::endl;
    }
    command.set_angular(0);
    cmd_publisher.Send(command);
  }

  eCAL::Finalize();
}

The output may be as follows:

receive: 12748322828390ns 1
receive: 12748423048315ns 3
receive: 12748523263763ns 5
receive: 12748623409378ns 6
receive: 12748723617526ns 8
pub zero
receive: 12748823754726ns 8   #error occurred,I expected it to be 0, but it's still the last updated data
pub zero
pub zero
receive: 12748924030631ns 8
pub zero
pub zero
receive: 12749024184296ns 8
pub zero
pub zero
receive: 12749124411089ns 8 
pub zero
pub zero
receive: 12749224568752ns 8
pub zero
pub zero

How did you get eCAL?

Ubuntu PPA (apt-get)

Environment

eCAL: 5.11.5 OS: ubuntu20.04.1

eCAL System Information

------------------------- SYSTEM ---------------------------------
Version                  : v5.11.5 (2023-07-24 12:19:00 +0200)
Platform                 : linux

------------------------- CONFIGURATION --------------------------
Default INI              : /etc/ecal/ecal.ini

------------------------- NETWORK --------------------------------
Host name                : zk-Lenovo-Legion
Network mode             : cloud
Network ttl              : 2
Network sndbuf           : 5 MByte
Network rcvbuf           : 5 MByte
Multicast group          : 239.0.0.1
Multicast mask           : 0.0.0.15
Multicast ports          : 14000 - 14010
Multicast join all IFs   : on
Bandwidth limit (udp)    : not limited

------------------------- TIME -----------------------------------
Synchronization realtime : "ecaltime-localtime"
Synchronization replay   : 
State                    :  synchronized 
Master / Slave           :  Master 
Status (Code)            : "everything is fine." (0)

------------------------- PUBLISHER LAYER DEFAULTS ---------------
Layer Mode INPROC        : off
Layer Mode SHM           : auto
Layer Mode TCP           : off
Layer Mode UDP MC        : auto

------------------------- SUBSCRIPTION LAYER DEFAULTS ------------
Layer Mode INPROC        : on
Layer Mode SHM           : on
Layer Mode TCP           : on
Layer Mode UDP MC        : on
Npcap UDP Reciever       : off
KerstinKeller commented 1 year ago

Thanks for reporting. I don't see anything in the changelog either, but I will try to reproduce.

KerstinKeller commented 1 year ago

I can reproduce on master, ~~however the diff between 5.11.4 and 5.11.5 is not extremely large. You are positive that it used to work on 5.11.4?~~

not clearing the receive buffer (for performance reasons) - was the culprit. https://github.com/eclipse-ecal/ecal/compare/v5.11.4...v5.11.5#diff-617c203026e9b95cd2fb17a2877488753f831d1d4e2a0b0be12e4cd0324903c4

At least we now have a testcase to repoduce, but we need to take a further look at how to optimize without re-introducing the bug.

KerstinKeller commented 1 year ago

We have fixed this issue and it will be released with an upcoming 5.11.6 within a week or so (depending on what other changes we need to put into 5.11.6). Btw. the timestamps will also have been incorrect with 5.11.4, when sending 0 length payloads (e.g. empty protobu message).