autowarefoundation / autoware_ai

Apache License 2.0
23 stars 8 forks source link

vision darknet yolo3 process died with exit code -11 after running for sometime #748

Closed tekrajchhetri closed 3 years ago

tekrajchhetri commented 4 years ago

This is the issue with autoware 1.12.0 and yolo3. The yolo starts fine but after some point the the node died automatically. Further, it says the log is recorded in location XXX but no logs are recorded and cannot see what actually is going on.

System Information:

While running I also observed the memory and GPU consumption, which seems normal. I am unable to find what actually is causing this issue. It would be really helpful if someone could give some direction.

How I am running? I have a video which i am publishing in a topic. Then running yolo detection with based on this topic.

Thank you!

pedroexenberger commented 4 years ago

The same occurs with me using Autoware 1.12, Ubuntu 16, ROS kinetic, CUDA 9.0. The failure message and reported log does not indicate clearly what the problem is. The workaround I found was to re-launch all nodes from scratch. Sometimes I have to repeat the process multiple times.

yantaixu0120 commented 3 years ago

I have the same errors while I using Autoware 1.14.0, Ubuntu 18.04, ROS moledic, CUDA 10.2 have anyone solved this problem???

SchDevel commented 3 years ago

Hey there,

I am having exactly the same problem with Ubuntu 20.04, ROS melodic and CUDA 11.0. I tracked the error down to the following line (336) in https://github.com/Autoware-AI/core_perception/blob/master/vision_darknet_detect/darknet/src/yolo_layer.c

dets[count].prob[j] = (prob > thresh) ? prob : 0;

The problem occures when prob is NaN.

I already checked the allocation of dets[count].prob in https://github.com/Autoware-AI/core_perception/blob/master/vision_darknet_detect/darknet/src/network.c, which does not return a null pointer.

Additionally I observed, that this only occures with the Yolov3-spp and the "normal" Yolov3. Yolov3-tiny and Yolov2 are running fine and reliable.

I am really looking forward to solving this problem, as it is important for a student project.

Does anyone already have a solution or an idea?

Best regards

SchDevel

yantaixu0120 commented 3 years ago

This is the issue with autoware 1.12.0 and yolo3. The yolo starts fine but after some point the the node died automatically. Further, it says the log is recorded in location XXX but no logs are recorded and cannot see what actually is going on.

System Information:

* Autoware 1.12.0

* Yolov3

* Nvidia  Driver Version: 440.100

* CUDA Version: 10.2

* ROS melodic

* Ubuntu 18

While running I also observed the memory and GPU consumption, which seems normal. I am unable to find what actually is causing this issue. It would be really helpful if someone could give some direction.

How I am running? I have a video which i am publishing in a topic. Then running yolo detection with based on this topic.

Thank you!

Hello,have you solved this problem?I have encountered the same errors.

JWhitleyWork commented 3 years ago

Per our support guidelines, please submit support questions to ROS Answers.

xiewenjing1170 commented 2 years ago

I meet the same question. Yolov3 node dies all the time. I want to ask if anyone solve this problem. I will really appreciate it. My error as follows: https://github.com/Autoware-AI/autoware.ai/issues/2394#issue-1059870505