Closed RuPingCen closed 2 months ago
For the latency, could you try removing the sleep here: https://github.com/dora-rs/autoware.universe/blob/db29af5314dbe4c3c9c5c79ab78e33f2a8bbd327/localization/ekf_localizer/src/gnss_ekf.cc#L65
The _“EKF_fusionpthread“” thread is the sleep of the thread that sends data and has nothing to do with data reception. Data packet loss has occurred when receiving data from other nodes in the function *“int run(void dora_context) ”**.
Dora uses a default queue size of 10 for each input. If the receiver cannot keep up with the sender, dora deliberately drops some inputs (oldest first). If you set the RUST_LOG
environment variable to debug
you will see a "dropped X inputs because event queue was too full" in the command-line output.
You can set a custom queue size in your YAML file, e.g.: https://github.com/dora-rs/dora/blob/ea47a556a77f6c64f29ab07939d747edc8573671/examples/python-dataflow/dataflow.yml#L5-L8
As you said I set the parameter queue_size: 100, but data packet loss still exists.
I check the logs for the python node "gnss" , which is normal pub the data with 4643.
The second node "gnss_poser" with c/c++ finally received only 4229 data_seq.
This phenomenon started to occur when receiving the 33rd data seq.
here is the xxx.yaml file:
nodes:
- id: gnss
operator:
python: ../../dora-hardware/vendors/gnss/CGI_610/CGI_610_driver_dora_with_file.py #读取文件用于测试
#python: ../../dora-hardware/vendors/gnss/CGI_610/CGI_610_driver_dora.py
inputs:
tick: dora/timer/millis/20
outputs:
- DoraSentence
- DoraNavSatFix
- DoraQuaternionStamped
- DoraTwistStamped
env:
DEVICE_INDEX: 0
- id: gnss_poser #坐标系转化
custom:
source: ../../sensing/gnss_poser/build/gnss_poser_core
inputs:
DoraNavSatFix: gnss/DoraNavSatFix
queue_size: 100
outputs:
- DoraNavSatFix
- id: gnss_rviz2 #在RVIZ2中显示转换后的轨迹 (E-N-U)
operator:
python: ../../sensing/gnss_poser/src/gps_to_ros2.py
inputs:
DoraNavSatFix: gnss_poser/DoraNavSatFix
queue_size: 100
- id: gnss_ekf # c/c++节点
custom:
source: build/gnss_ekf
inputs:
DoraNavSatFix: gnss_poser/DoraNavSatFix
DoraQuaternionStamped: gnss/DoraQuaternionStamped
outputs:
- DoraGnssPose
queue_size: 100
- id: ekf_odom_rviz2 #在rviz中显示数据
operator:
python: src/ekf_odom_to_ros2.py
inputs:
DoraGnssPose: gnss_ekf/DoraGnssPose
queue_size: 100
Hu has also tested with 100K queue size today and can reproduce this issue.
Thanks for the details!
Could you please also share the full output of the terminal on the top left? I see that there are some error in the output, so the full content might help debugging this. Also, setting RUST_LOG=debug
before dora up
should expose more details in the logs, including information when events are dropped.
The error in the terminal on the upper left corner is that we did not use a real GPS sensor during debugging, but used the data in the txt file to simulate GPS data. This error is generated after the data in the txt file has been read and the node exits. Therefore, this error should not affect data loss.
The details can be found in line 26 of file CGI_610_driver_dora_with_file.py( https://github.com/dora-rs/autoware.universe/blob/feature/autoware_dora/dora-hardware/vendors/gnss/CGI_610/CGI_610_driver_dora_with_file.py)
I just noticed that you added the queue_size
parameter as a child of the the custom:
object. It has no effect there. Instead, you need to add it as a child field to an input. For that, add another level of nesting underneath the input, specify the source mapping as a source
field, and then add the queue_size
as a second field. I opened a small PR for that: https://github.com/dora-rs/autoware.universe/pull/6. Please retry again with these changes.
Hello! I tested this change and the packet loss problem still exists. here is the yaml file:
nodes:
- id: gnss
operator:
python: ../../dora-hardware/vendors/gnss/CGI_610/CGI_610_driver_dora_with_file.py #读取文件用于测试
#python: ../../dora-hardware/vendors/gnss/CGI_610/CGI_610_driver_dora.py
inputs:
# tick: dora/timer/millis/20
tick:
source: dora/timer/millis/20
queue_size: 1000
# queue_size: 100
outputs:
- DoraSentence
- DoraNavSatFix
- DoraQuaternionStamped
- DoraTwistStamped
env:
DEVICE_INDEX: 0
# - id: gnss_sub #数据格式转换 json
# operator:
# python: ../../dora-hardware/vendors/gnss/CGI_610/nmea_subscribeTopic.py
# inputs:
# DoraSentence: gnss/DoraSentence
# DoraNavSatFix: gnss/DoraNavSatFix
# DoraQuaternionStamped: gnss/DoraQuaternionStamped
# DoraTwistStamped: gnss/DoraTwistStamped
# outputs:
# - DoraNavSatFix
# - DoraQuaternionStamped
- id: gnss_poser #坐标系转化
custom:
source: ../../sensing/gnss_poser/build/gnss_poser_core
inputs:
DoraNavSatFix:
source: gnss/DoraNavSatFix
queue_size: 1000
outputs:
- DoraNavSatFix
- id: gnss_rviz2 #在RVIZ2中显示转换后的轨迹 (E-N-U)
operator:
python: ../../sensing/gnss_poser/src/gps_to_ros2.py
inputs:
DoraNavSatFix:
source: gnss_poser/DoraNavSatFix
queue_size: 1000
##############################################################################################
- id: gnss_ekf # python 实现
# custom:
# source: build/gnss_ekf
operator:
python: ../../localization/ekf_localizer/src/gnss_ekf.py
inputs:
DoraNavSatFix:
source: gnss_poser/DoraNavSatFix
queue_size: 1000
DoraQuaternionStamped:
source: gnss/DoraQuaternionStamped
queue_size: 1000
tick:
source: dora/timer/millis/20
queue_size: 1000
outputs:
- DoraGnssPose
# - id: gnss_ekf # c/c++节点
# # custom:
# # source: build/gnss_ekf
# inputs:
# DoraNavSatFix: gnss_poser/DoraNavSatFix
# DoraQuaternionStamped: gnss/DoraQuaternionStamped
# outputs:
# - DoraGnssPose
- id: ekf_odom_rviz2 #在rviz中显示数据
operator:
python: src/ekf_odom_to_ros2.py
inputs:
DoraGnssPose:
source: gnss_ekf/DoraGnssPose
queue_size: 1000
For debugging, it would be useful to have a minimal example that still triggers this issue. Could you try the following:
After these steps, we ideally have a very small example that we can try easily on our machines without working into the autoware build details.
Hello,This is the complete reproduction process of the above example: https://github.com/dora-rs/autoware.universe/blob/feature/autoware_dora/localization/ekf_localizer/Packet_loss_test.md For minimal example,I write demo to reproduce this proble,the code is here: https://github.com/dora-rs/autoware.universe/tree/feature/autoware_dora/tools/C_Python_test. And in this process, you may also encounter issue516. Because the process involves c node, it is inevitable to encounter some build processes.
The second link is the minimal example. The link for all examples is
https://github.com/dora-rs/autoware.universe/tree/feature/autoware_dora/tools
Here are some minimal test tools helping reproduce issues.
Folder C_Python_test
It is a minimal example the one for reproduce #515 and #516. For instructions please go to folder and you will see the steps.
Folder C_node_test
It is a minimal example the one for reproduce #502. For instructions please go to folder and you will see the steps.
Here are some new results for #515 after making changes to philip's comments - "I just noticed that you added the queue_size parameter as a child of the the custom: object. It has no effect there. Instead, you need to add it as a child field to an input.". You can see that change does not solve the issue.
queue_size: 1000
First restart:
First time: package loss 1
Second time: package loss 0
Third time: package loss 3
Fourth time: package loss 0
Fifth time: package loss 4
Second restart:
First time: package loss 1
Second time: package loss 43
Third time: package loss 4
Fourth time: package loss 4
Fifth time: package loss 5
queue_size: 100
First restart:
First time: package loss 2
Second time: package loss 6
Third time: package loss 3
Fourth time: package loss 637
Fifth time: package loss 3
Second restart:
First time: package loss 193
Second time: package loss 259
Third time: package loss 1
Fourth time: package loss 1
Fifth time: package loss 4
The easiest way to reproduce issue is this minimal test example - https://github.com/dora-rs/autoware.universe/tree/feature/autoware_dora/tools/C_Python_test.
Ok, so I tried your example and I can reproduce the dropped messages. If you spawn the coordinator and daemon manually and set RUST_LOG=debug
, you see a lot of "dropped X inputs because event queue was too full" log messages. This occurs for both the noda_A/tick
input (a lot) and for the node_B/input-1
.
When I increase the queue_size
of node_B/input-1
to 10000, all 10k messages reach the Python node:
Input Data length: 3
Received messages: 10000/10000
=======================
The reason why this happens mostly when combining a sending C node with a Python receiver is probably that Python is slower, so it cannot keep up with the inputs.
I think a change such as #533 would make it more obvious what's happening.
After further investigation, there were no packet loss and it was a problem in the C++ String with no null termination.
Describe the bug The dora is V0.3.2.
I use two nodes. Node A is written in Python. Node B is written in c/c++.
Node A reads GPS data at a 20ms cycle and sends the data via a json string. Node B performs coordinate conversion by receiving data from node A, but I found that node B will lose data packets when receiving data from node A.
Here I print the seq number of the data pub by node A, the message count received by node B, and the time when the message was received. There was no packet loss at the beginning, but after running for a while, the message count received by node B was not equal to the seq value of the message.
And, the period of node B to receive data should normally be 20ms, which depends on the period of node A sending data. However, the following picture shows that the time difference between the two events is 40ms. (lower left corner)
here is the test code: https://github.com/dora-rs/autoware.universe/blob/feature/autoware_dora/localization/ekf_localizer/test_ekf_dataflow.yml