Closed alexiskovi closed 5 years ago
@alexiskovi, Thank you for using Apollo! We added 3D bbox and classification information at the deconvolution layer. We recommend you to use the model in docker since there are multiple dependencies. There is the unit test code (apollo/modules/perception/camera/test/camera_lib_obstacle_detector_yolo_region_output_test.cc) you can use for stand alone test in docker.
@alexiskovi , or you can use this offline standalone test (modules/perception/camera/tools/offline/offline_obstacle_detector.cc).
@techoe, @KaWaiTsoiBaidu, thanks a lot! Hope, it will help. Could you explain, what these layers mean? Maybe there are some docs about apollo nets structure?
Please take a look at the prototxt. There are comments for each layer. 3D bounding box properties and left/right turn signals, brake signals are the output of the network.
Prototxt here
Closing this issue as it appears to be fixed. Feel free to reopen if you have additional questions. Thanks!
So, I've found comment about bbox But, unfortunately, i didn't get how to interpritate deconvolutional layer output information. Maybe you pointed another comment or another layer? Because ordinary output vector doesn't look like yolo output vector. So, general question here - is bbox data stored in one of mentioned layers, or we need to do some extra actions?
@alexiskovi the output for the 3D bbox are dim_pred
and ori_pred
, one for 3D bbox dimensions (H,W,L) and one for 3D bbox orientation
. It does not output location (X,Y,Z) of the 3D bbox though.
loc_pred
stores the information of 2D bbox.
Thanks for clarifying layer meanings. However, didn't get the idea, how to extract info from layers. So, here is output shape of loc_pred: (1, 50, 90, 64). It means, this vector have 64 values, in each vertex of the image net of size (50,90), but is it real to get bbox information in the final form, like coordinates, class and probability?
@alexiskovi The layer has 4*16=64 layers, 4-->(x,y,h,w) of the bbox, 16--> 16 anchor boxes for each vertex of the (50, 90) feature map. For converting the raw output to (x,y,h,w), please see this kernel code.
@KaWaiTsoiBaidu, thank you a lot for making this clear!
In each vertex of net we have 16 BBoxes. So, how to decide, which of (509016) Bboxes should be drown? Could you possibly point at line, which works with extracting Bbox significance for each vertex, may be there is another layer with shape (50,90,16) which of them corresponds to objects, which of them corresponds to noise.
Thanks in advance
@alexiskovi This output specifies the objectness (shape (50,90,16)) of each bounding box.
@alexiskovi, Thank you for using Apollo! We added 3D bbox and classification information at the deconvolution layer. We recommend you to use the model in docker since there are multiple dependencies. There is the unit test code (apollo/modules/perception/camera/test/camera_lib_obstacle_detector_yolo_region_output_test.cc) you can use for stand alone test in docker.
How can I run this test?
I use apollo 5.0 on ubuntu 16.04 and I built it.
Actually, my question is a general question. I know there are a lot of test files to test some parts of Apollo. However, I do not know how I can run it. I tried the command below but it is not working:
./apollo.sh /modules/perception/camera/test:camera_lib_obstacle_detector_yolo_region_output_test.cc
Here we have output layers of yolo_obstacle_detector neural network:
As we got, in every unit there are multidimentional object information. But how to extract object position, class, and bounding boxes size data? If yolo_obstacle_detector can give this information itself, then:
If no, then can you say, which neural networks take data from yolo_obstacle_detector layers and make final result?
Thanks in advance