ceptontech / cepton_sdk_redist

Cepton SDK redistribution channel
BSD 3-Clause "New" or "Revised" License
24 stars 16 forks source link

Callback does not always fire #7

Closed cho3 closed 5 years ago

cho3 commented 5 years ago

Using version 1.11 of the SDK.

When the SDK is initialized in the following way:

CEPTON_NODE_LOCAL inline void cepton_init(sensor_msgs::msg::PointCloud2 & user_data)
{
  // initialize cepton sdk
  cepton_sdk::Options options = cepton_sdk::create_options();
  options.frame.mode = static_cast<uint32_t>(CEPTON_SDK_FRAME_COVER);
  options.control_flags |= static_cast<uint32_t>(CEPTON_SDK_CONTROL_DISABLE_NETWORK);
  cepton_init_internal(options);
  // register callback
  void * const user_data_ptr = static_cast<void *>(&user_data);
  const cepton_sdk::SensorError ret =
    cepton_sdk::listen_image_frames(&cloud_callback, user_data_ptr);
  if (static_cast<int32_t>(CEPTON_SUCCESS) != static_cast<cepton_sdk::SensorErrorCode>(ret)) {
    throw ret;
  }
}

We then manually pass packets to the SDK as follows:

  const cepton_sdk::SensorError ret =
    cepton_sdk::mock_network_receive(dummy_handle, timestamp_us, &pkt.data[0U], CEPTON_PACKET_SIZE);

And replaying sample data, the SDK occasionally does not fire the registered callback for the entire run even though packets are received and passed to the SDK.

This happens more frequently on aarch64 machines, and slightly more frequently if the platform is stressed (i.e. stress -m 12 -c 12)

cho3 commented 5 years ago

I have also reproduced this failure with the latest version of the SDK (fbe2f6c37613eb273cdb10c8fc453a22106d47ed)

spectralflight commented 5 years ago
cho3 commented 5 years ago

@spectralflight

  1. I can give this a try
  2. Yes we are using the Vista
  3. The data sample is 23 seconds long. I believe this sample was provided by Cepton (under the name lidar.pcap)
spectralflight commented 5 years ago

A few more debugging questions:

cho3 commented 5 years ago

@spectralflight

How are you getting the packets from the PCAP file? Are you using our Capture class, or your own code? Some of the packets in the PCAP file are fragmented and need to be reassembled.

For the test replayer, we take packets using the provided Capture class in the SDK. We then send the packet over UDP.

Are all your SDK calls happening in a single thread?

Yes

After you pass all the data to the SDK, if you call cepton_sdk_get_n_sensors, does it report that sensors are connected?

I'll check this when I'm giving using the default streaming mode a try.

cho3 commented 5 years ago

@spectralflight Regarding the following open points:

Does the issue happen if you don't change the frame mode (defaults to streaming)?

Yes, the issue appears to occur.

After you pass all the data to the SDK, if you call cepton_sdk_get_n_sensors, does it report that sensors are connected?

In the normal case, it reports 1. Typically the SDK will receive 200-300 packets and report 0 sensors connected before reporting 1. When this occurs, the callback typically starts firing in the COVER case.

In the failure case in both the COVER and STREAMING setup, it appears that the SDK always reports 0 sensors connected for the duration of the test.

spectralflight commented 5 years ago

Hi. Sorry for the delay, I was on vacation.

If the SDK reports that 0 sensors are connected, then it means the SDK isn't receiving or parsing the sensor calibration packets. Parsing the sensor calibration packet should have deterministic behavior, so it seems like the only explanation for your issue is that the calibration packet aren't being received by the SDK.

The first 32 bits of the packet data is the packet type id. Would you be able to print out the different packet type ids that are received? Ideally, the SDK should be receiving 2 different packet types (sensor calibration packet and sensor points packet).

const uint32_t packet_signature = *(uint32_t const *)pkt.data;

cho3 commented 5 years ago

In the good case we see (using a std::set to aggregate all signatures):

1330007625
1346655315

In the failure case we see:

1346655315

So does this imply that there's the "calibration" packet being dropped?

spectralflight commented 5 years ago

Yes, that means the calibration packets are getting dropped, which would explain your behavior. The calibration packets are fairly large, so they will get segmented into 2 UDP packets.

cho3 commented 5 years ago

@spectralflight Thanks a lot for the help.

Can you provide any numbers for how big the packets of the two types are? It would help us a lot with our debugging.

spectralflight commented 5 years ago

No problem. Sorry it took me so long to diagnose the issue.

It can vary slightly depending on the sensor firmware. For the current sensor I have on my desk, the packet sizes are: data=1360, calibration=1762.

Here is a command to print out the packet sizes in a PCAP file:

tshark -nr -T fields -e frame.len | sort -n | uniq -c

spectralflight commented 5 years ago

@cho3 Where you able to track down the issue? If so, can I close this ticket?

cho3 commented 5 years ago

@spectralflight Sorry, I was also on vacation =).

Would you happen to have any statistics or SDK compiler definitions that can tell us how frequently we should be expecting the calibration packet (i.e. every 1000 times)?

This would help us identify if we're dropping the calibration packet.

Other than this, I think you can call this ticket closed, since there are no actionable things that need to change on the SDK level.

spectralflight commented 5 years ago

No worries. Hope you had fun!

The calibration packet is sent at 1Hz.

I'll close the ticket, and we can switch to email (jon.allen@cepton.com).