autowarefoundation / autoware_ai

Apache License 2.0
23 stars 8 forks source link

[Discussion] Detection Architecture #288

Closed kfunaoka closed 4 years ago

kfunaoka commented 6 years ago

New Design

43500108-dbe7a948-9589-11e8-80c7-8c5c307eabe6

The design above doesn't have vision_tracker. vision_tracker can't belong to object_tracker because vision_tracker tracks only the results of vision_detector. This issue is where we discuss how the fusion should be.

Component Node Example
vision_detector Yolo3
vision_tracker Beyond Pixel
lidar_detector Euclidean Cluster
kfunaoka commented 6 years ago

Does this connection work?

kfunaoka commented 6 years ago

I’ve discussed the design with @amc-nu. If any question or suggestion, please let me know.

The conclusion now is:

2018-09-07 13 30 41
dejanpan commented 5 years ago

General Guidelines for Architecture Design

Definitions

Deterministic: Same output assuming the same sensor data input.

Guidelines

  1. The software is organized in modules called nodes which are represented as ROS nodes in code
  2. Every node must handle one specific task (separation of concerns).
  3. The coupling between nodes should be as low as possible.
  4. Nodes are only allowed to communicate with each other over ROS
  5. MSG files are the interface specification of the nodes. Therefore their design and documentation must be done with great care and foresight.
  6. Nodes are not allowed to share threads to reduce coupling on a resource level
  7. Nodes must not depend on implementation details of other nodes.
  8. Every node has one main thread which runs a while loop.
  9. Sequential parallelization inside a node is not allowed. This is done on the ROS level.
  10. Parallel parallelization must be done using e.g. TBB or OpenMP
  11. Before processing data everything needs to be transformed into the same target frame and target time. Every node must have exactly one triggering topic which triggers a new iteration of the while loop on data received. See link
  12. Use the timestamp of the triggering data.
  13. Deterministically calculate the target time based on the timestamp of the triggering data
  14. The access of the system clock is not allowed (std::chrono::system_clock etc). In other words the system clock must not influence the behavior of the system.
  15. Sleeping based on duration or time is not allowed (only waiting for external triggers).
  16. Processes are not used to modularize the software. Nodes are used for this purpose.

Guidelines C++

  1. OOP and RAII paradigms to be used strictly through the code.
  2. Exceptions must be used. Return values are not allowed.
  3. Non-const reference members are not allowed.
  4. The use of points must be technically justified and this justification must be documented.
  5. Use references over pointers.
  6. Use raw pointers to indicate non-ownership pointers and smart pointers for ownership pointers.
  7. Prefer unique pointers over shared pointers.
dejanpan commented 5 years ago

@kfunaoka @amc-nu couple of comments on this: https://github.com/CPFL/Autoware/issues/1501#issuecomment-419325252:

  1. Does the vision_detector assume that the image coming in is already undistorted
  2. What does vision_detector do? I assume it loads a certain CNN network configuration and model? Is it this configurable?
  3. Where are autoware_detection_msgs::ObjectArray defined? I would like to see what is inside to know what the input and output for vision_tracker is
  4. I see that this tracker here requires a different input: https://github.com/CPFL/Autoware/tree/develop/ros/src/computing/perception/detection/vision_tracker/packages/vision_beyond_track
  5. What does the range_vision_fusion do? Fuse bounding boxes from the lidar detector and vision_tracker? Or does it fuse lidar and image points? In any case this node also needs the TF and we should ask Brian how to have non-blocking TF calls
  6. For the fusion we should transform data into the same target frame and target time. This node must have exactly one triggering topic which triggers a new iteration of the while loop on data received also waits on the data from the non-triggering topic
  7. I am at this point not sure what is the role of a fusion_detector?
  8. And why do we have another object_tracker at the end?
  9. The nodes should be programmed so that they have at least an init and a run state.
  10. We should also use http://wiki.ros.org/bondcpp to be able to monitor the nodes
  11. We should use ROS_[INFO|WARN|DEBUG] but then disable it in "production" https://answers.ros.org/question/9627/how-can-i-completely-disable-writing-logs-to-filesystem/?answer=225993#post-id-225993
  12. We have a lidar_tracker coming up in Autoware.Auto
  13. We need to treat map as another sensor, e.g. see and also for filtering of objects out or in of a certain region of interest
  14. We should start adding long missing features:
    1. object classification (as oppose to only detection)
    2. lane marks recognition (in case we start drifting outside of the lane)
    3. curb and drivable area recognition (drivable are will a polygon and not a bounding box anymore)
    4. unknown object classification
  15. I think that we should think about better abstraction at the end of the object detection pipeline than just objects. One idea would be to come up with the driveability map
dejanpan commented 5 years ago

For the reference I will just add that there are also perception architectures that are built around the grid worlds:

  1. http://www.transport-research.info/sites/default/files/project/documents/20130823_164344_22255_paperModellingTrackingDrivingEnvironment_byDanescuothers.pdf
  2. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.707.5047&rep=rep1&type=pdf
  3. https://hal.inria.fr/inria-00182004/document
  4. https://hal.archives-ouvertes.fr/file/index/docid/295084/filename/08-star-tay-etal.pdf
kfunaoka commented 5 years ago

I've created a issue autowarefoundation/autoware_ai#368 for General Guidelines for Architecture Design.

We'll answer Dejan's questions about perception after ready.

dejanpan commented 5 years ago

@kfunaoka @amc-nu I went ahead and prepared a slightly different architecture here: https://docs.google.com/presentation/d/1XhWxf-gWZqBfdFNBdZnI3RnMnmUj9l7UDlak1D-Qgoc/edit?usp=sharing.

PNG: objectperceptionarchitecture

I did in GSlides. Did we meanwhile agree on the preferred tool for architecture modeling like this? @kfunaoka what tool did you use for diagram here: https://github.com/CPFL/Autoware/issues/1501#issuecomment-419325252.

Couple of thoughts that went into the creation:

  1. radar and AD map are treated as additional sensors
  2. fusion of individual PointCloud or Image streams is done in the device driver itself
  3. fusion of data is possible on multiple levels
  4. AD map interface is in OpenDrive format
  5. Object detector can be based on extracting the ground and then computing bounding boxes or occupancy grid and then extracting bounding boxes. In the latter case it is also possible to track the objects inside the occupancy grid itself. I do not know how to make this generic yet.
  6. This should also allow us to treat traffic lights and traffic signs exactly the same as any other objects (cars, pedestrians, cyclists)
amc-nu commented 5 years ago

@dejanpan here some answers to the questions from https://github.com/CPFL/Autoware/issues/1501#issuecomment-431543021

  1. Yes
  2. It uses some image detector on the camera stream. Each node is either based on a CNN or Machine learning method. These are configurable. Custom models and definitions can be used.
  3. Please have a look in the messages directory for the DetectedObject message
  4. Currently all the Detection workflow uses the same message definition DetectedObject. Each part is filled or updated through the detection workflow .
  5. Fuse boxes and rects from Vision and lidar detectors. The TF between lidar and camera is read only once (since that won’t be changing over time)
  6. The node can synchronize using http://wiki.ros.org/message_filters#Time_Synchronizer
  7. Fusion_detector would output the detection already fused. Please check CNN based fusion object detector( Avod, MV3D, etc). The package is just a container for the time being.
  8. In the case when vision tracker is not used we can still track objects using the Lidar tracker.
amc-nu commented 5 years ago

@dejanpan here some answers to the questions from https://github.com/CPFL/Autoware/issues/1501#issuecomment-431543021

  1. Yes
  2. It uses some image detector on the camera stream. Each node is either based on a CNN or Machine learning method. These are configurable. Custom models and definitions can be used.
  3. Please have a look in the messages directory for the DetectedObject message
  4. Currently all the Detection workflow uses the same message definition DetectedObject. Each part is filled or updated through the detection workflow .
  5. Fuses boxes and rects from Vision and lidar detectors. The TF between lidar and camera is read only once (since that won’t be changing over time)
  6. The node can synchronize using http://wiki.ros.org/message_filters#Time_Synchronizer
  7. Fusion_detector would output the detection already fused. Please check CNN based fusion object detector( Avod, MV3D, etc). The package is just a container for the time being.
  8. In the case when vision tracker is not used we can still track objects using the Lidar tracker.
dejanpan commented 5 years ago

test

kfunaoka commented 5 years ago

@kfunaoka what tool did you use for diagram here: autowarefoundation/autoware_ai#288 (comment).

@dejanpan I'm using http://en.plantuml.com/. I'd like to know better tool.

I'll look into your comments in this issue and #proj-architecture tomorrow.

kfunaoka commented 5 years ago

This is the latest version of the component graph for perception module. https://github.com/CPFL/Autoware/issues/1409#issuecomment-429686913