Strange behaviour regarding publishing

SomaGallai commented 3 years ago

Hey, We started to see a strange behavior of not being able to publish the /scan topic. We have tried different things to see what is causing the issue, basically none of them were consistent.

We have tried restarting the PC and it did not work.
Rebooting the SICK Nanoscan3 Scanner sometimes made the /scan topic publish.
Changing the network connection has no effect on the results.
The issue usually arises when we step in front of the scanner and trigger a critical safety stop, but after multiple testing it was not consistent.

We tried to find the root cause of not being able to publish the /scan topic , but it's to random as we can see. It seems to be the main issue is when the critical safety zone is triggered.

lenpuc commented 3 years ago

Okay, if you notice any pattern this would greatly increase the chances of finding a fix.

I will have a look into it and try see if I can repeat the issue especially in regards to the safety zones.

If i grsp it correctly the sensor suddenly stops sending new messages? Is the not publishing scans persistent then? Or does the sensor start publishing again once you leave the safety field?

SomaGallai commented 3 years ago

It does not stop publishing as long as we have started ros2 topic echo /scan before stepping in to the safe zone, but as soon as we kill the process we are not able to reconnect using ros2 topic echo /scan.

lenpuc commented 3 years ago

That is odd behaviour. So once you kill the ros2 topic echo /scan you will not getting messages on restarting the echo while in the safety field? Did i understand that correctly?

I tried that a few times now here, never had that problem. I can always reconnect the echo and will get messages.

Do you maybe have different nodes running which might interfere? Or are any special QoS Profiles active or do you have modifications to the underlying DDS communication?

SomaGallai commented 3 years ago

After we kill ros2 topic echo /scan whether we have an object in the safety field or not we are not getting messages on restarting the echo. We have tried running as a stand alone program as well with same results. No QoS is used, we have tried without any DDS modification and while running the Fast-DDS Discovery Server. All of this tries has the same issue for us.

lenpuc commented 3 years ago

Okay, that is behavior I haven't encountered or heard of before. Did you try it with the same computer and same scanner each time? If yes, maybe try a different computer. And if possible another sensor? Maybe there resides an error.

Or maybe your firewall blocks incoming UDP messages, though then I would have expected that you will never get any messages. There has been an issue like that in the ROS1 driver: https://github.com/SICKAG/sick_safetyscanners/issues/15#issuecomment-498231731

eliasdc commented 3 years ago

Hey, Just wanted to give my input too. We're seeing similar behavior and it is very hard to pin point what the problem is. But the difference is that I know how to make it work again. By restarting the entire system or by power cycling the lidar and restart the ros2 nodes. It is mostly happening when I'm developing and constantly restarting the ros2 nodes (including sick_safetyscanners2), then the lidar gets into a state where it is not excepting commands anymore. Once it is in this state this package reports the acknowledge message for the first command but not for the second one. It does also happen during an active session when the lidar stops sending data.

Our sensor is configured to start sending data on request and is not configured to send data continuously to an IP.

One extra difference between ROS1 driver and ROS2 driver is that I see a lot more output related to setting parameters. It used to be only once but now it is repeated for every subscriber I believe.

If I find a good way of replicating it I'll post it here.

lenpuc commented 3 years ago

If I find a good way of replicating it I'll post it here.

Please do, as of now I can not reproduce the error on the MicroScan3, therefore I can't track down where the issue might arise from.

One extra difference between ROS1 driver and ROS2 driver is that I see a lot more output related to setting parameters. It used to be only once but now it is repeated for every subscriber I believe.

Yes, this is currently invoked on each parameter change. There should be some more sanity checks if the correct parameters are changed, this is on my agenda but I did not manage to implement these yet. This is due to the integration of dynamic reconfigure into the parameters. So yes there are currently param changes if a new parameter is detected or the value of a parameter changes.

eliasdc commented 3 years ago

Still not found a good way to reproduce it. I reverted back to our own port from the ros1 version (https://github.com/Tractonomy/sick_safetyscanners). With that package I never see it happen once we are connected but it can happen during initial start-up/configuration. Trying to launch the sensor multiple times fixes the problem after a while.

I did do some connection troubleshooting and found out that the Cola configuration connection is always able to connect of the 2112 port, while the secondary random port for streaming is not created/connected in the cases where the connection does not work.

So my assumption is that due to the change of how parameter changes trigger a configuration change it breaks the current connection. With the old package the configuration happens at startup and is not tried again later so the connection keeps working. With the new package I can see that multiple tries are done to configure the device even though I'm not changing parameters, and I believe this sometimes breaks the active connection and does not setup a new one.

lenpuc commented 3 years ago

Thanks for the insights, I'll try to setup a sanity check to only invoke a param update on the relevant parameters. That hopefully resolves some of the issues. Why it breaks the connection in the first place I still don't know.

Can you set a fixed UDP Port for the streaming, instead of the a port from the ephemeral range? Maybe this helps for stable connections?

lenpuc commented 3 years ago

I just updated the parameter callbacks to the correct ROS2 interface for callbacks. Now its not triggered on each event callback anymore. This means the updates to the sensor will only be triggered if a relevant parameter was updated. Hope this helps.

However I am not too happy with the ROS2 parameters as of now, so I will probably do some refactoring there. It seems like a lot of code to load them initially from the launch file and then dynamically set them again on a change. If you know of any best practices to handle ROS2 dynamic changeable parameters let me know.

SomaGallai commented 3 years ago

Hi @puck-fzi ,

We have pulled the new release and it seems to have fixed the issue where we are not able to reconnect. Thank you very much!

lenpuc commented 3 years ago

Happy to hear that

SICKAG / sick_safetyscanners2

Strange behaviour regarding publishing #3