kfunaoka commented 6 years ago

New Design

43500108-dbe7a948-9589-11e8-80c7-8c5c307eabe6

The design above doesn't have vision_tracker. vision_tracker can't belong to object_tracker because vision_tracker tracks only the results of vision_detector. This issue is where we discuss how the fusion should be.

Component	Node Example
vision_detector	Yolo3
vision_tracker	Beyond Pixel
lidar_detector	Euclidean Cluster

kfunaoka commented 6 years ago

Does this connection work?

vision_detector -> vision_tracker -> range_vision_fusion
lidar_detector -> lidar_tracker -> range_vision_fusion
range_vision_fusion -> object_tracker

kfunaoka commented 6 years ago

I’ve discussed the design with @amc-nu. If any question or suggestion, please let me know.

The conclusion now is:

vision_tacker is inserted between vision_detector and range_vision_fusion in order to improve the input of range_vision_fusion and (as the rusult) object_tracker.
object_tracker is not renamed (to e.g., lider_tracker) because it might uses the info (e.g., what the object is) come from vision_detector such as yolo3.
At this moment, it seems that lidar_tracker between lidar_detector and range_vision_fusion is unnecessary because we have no algorithms that should be between lidar_detector and range_vision_fusion at least now.

dejanpan commented 5 years ago

General Guidelines for Architecture Design

Definitions

Deterministic: Same output assuming the same sensor data input.

Guidelines

The software is organized in modules called nodes which are represented as ROS nodes in code
Every node must handle one specific task (separation of concerns).
The coupling between nodes should be as low as possible.
Nodes are only allowed to communicate with each other over ROS
MSG files are the interface specification of the nodes. Therefore their design and documentation must be done with great care and foresight.
Nodes are not allowed to share threads to reduce coupling on a resource level
Nodes must not depend on implementation details of other nodes.
Every node has one main thread which runs a while loop.
Sequential parallelization inside a node is not allowed. This is done on the ROS level.
Parallel parallelization must be done using e.g. TBB or OpenMP
Before processing data everything needs to be transformed into the same target frame and target time. Every node must have exactly one triggering topic which triggers a new iteration of the while loop on data received. See link
Use the timestamp of the triggering data.
Deterministically calculate the target time based on the timestamp of the triggering data
The access of the system clock is not allowed (std::chrono::system_clock etc). In other words the system clock must not influence the behavior of the system.
Sleeping based on duration or time is not allowed (only waiting for external triggers).
Processes are not used to modularize the software. Nodes are used for this purpose.

Guidelines C++

OOP and RAII paradigms to be used strictly through the code.
Exceptions must be used. Return values are not allowed.
Non-const reference members are not allowed.
The use of points must be technically justified and this justification must be documented.
Use references over pointers.
Use raw pointers to indicate non-ownership pointers and smart pointers for ownership pointers.
Prefer unique pointers over shared pointers.

dejanpan commented 5 years ago

@kfunaoka @amc-nu couple of comments on this: https://github.com/CPFL/Autoware/issues/1501#issuecomment-419325252:

Does the vision_detector assume that the image coming in is already undistorted
What does vision_detector do? I assume it loads a certain CNN network configuration and model? Is it this configurable?
Where are autoware_detection_msgs::ObjectArray defined? I would like to see what is inside to know what the input and output for vision_tracker is
I see that this tracker here requires a different input: https://github.com/CPFL/Autoware/tree/develop/ros/src/computing/perception/detection/vision_tracker/packages/vision_beyond_track
What does the range_vision_fusion do? Fuse bounding boxes from the lidar detector and vision_tracker? Or does it fuse lidar and image points? In any case this node also needs the TF and we should ask Brian how to have non-blocking TF calls
For the fusion we should transform data into the same target frame and target time. This node must have exactly one triggering topic which triggers a new iteration of the while loop on data received also waits on the data from the non-triggering topic
I am at this point not sure what is the role of a fusion_detector?
And why do we have another object_tracker at the end?
The nodes should be programmed so that they have at least an init and a run state.
We should also use http://wiki.ros.org/bondcpp to be able to monitor the nodes
We should use ROS_[INFO|WARN|DEBUG] but then disable it in "production" https://answers.ros.org/question/9627/how-can-i-completely-disable-writing-logs-to-filesystem/?answer=225993#post-id-225993
We have a lidar_tracker coming up in Autoware.Auto
We need to treat map as another sensor, e.g. see and also for filtering of objects out or in of a certain region of interest
We should start adding long missing features:
1. object classification (as oppose to only detection)
2. lane marks recognition (in case we start drifting outside of the lane)
3. curb and drivable area recognition (drivable are will a polygon and not a bounding box anymore)
4. unknown object classification
I think that we should think about better abstraction at the end of the object detection pipeline than just objects. One idea would be to come up with the driveability map

dejanpan commented 5 years ago

For the reference I will just add that there are also perception architectures that are built around the grid worlds:

kfunaoka commented 5 years ago

I've created a issue autowarefoundation/autoware_ai#368 for General Guidelines for Architecture Design.

We'll answer Dejan's questions about perception after ready.

dejanpan commented 5 years ago

@kfunaoka @amc-nu I went ahead and prepared a slightly different architecture here: https://docs.google.com/presentation/d/1XhWxf-gWZqBfdFNBdZnI3RnMnmUj9l7UDlak1D-Qgoc/edit?usp=sharing.

PNG: objectperceptionarchitecture

I did in GSlides. Did we meanwhile agree on the preferred tool for architecture modeling like this? @kfunaoka what tool did you use for diagram here: https://github.com/CPFL/Autoware/issues/1501#issuecomment-419325252.

Couple of thoughts that went into the creation:

radar and AD map are treated as additional sensors
fusion of individual PointCloud or Image streams is done in the device driver itself
fusion of data is possible on multiple levels
AD map interface is in OpenDrive format
Object detector can be based on extracting the ground and then computing bounding boxes or occupancy grid and then extracting bounding boxes. In the latter case it is also possible to track the objects inside the occupancy grid itself. I do not know how to make this generic yet.
This should also allow us to treat traffic lights and traffic signs exactly the same as any other objects (cars, pedestrians, cyclists)

amc-nu commented 5 years ago

@dejanpan here some answers to the questions from https://github.com/CPFL/Autoware/issues/1501#issuecomment-431543021

Yes
It uses some image detector on the camera stream. Each node is either based on a CNN or Machine learning method. These are configurable. Custom models and definitions can be used.
Please have a look in the messages directory for the DetectedObject message
Currently all the Detection workflow uses the same message definition DetectedObject. Each part is filled or updated through the detection workflow .
Fuse boxes and rects from Vision and lidar detectors. The TF between lidar and camera is read only once (since that won’t be changing over time)
The node can synchronize using http://wiki.ros.org/message_filters#Time_Synchronizer
Fusion_detector would output the detection already fused. Please check CNN based fusion object detector( Avod, MV3D, etc). The package is just a container for the time being.
In the case when vision tracker is not used we can still track objects using the Lidar tracker.

amc-nu commented 5 years ago

@dejanpan here some answers to the questions from https://github.com/CPFL/Autoware/issues/1501#issuecomment-431543021

Yes
It uses some image detector on the camera stream. Each node is either based on a CNN or Machine learning method. These are configurable. Custom models and definitions can be used.
Please have a look in the messages directory for the DetectedObject message
Currently all the Detection workflow uses the same message definition DetectedObject. Each part is filled or updated through the detection workflow .
Fuses boxes and rects from Vision and lidar detectors. The TF between lidar and camera is read only once (since that won’t be changing over time)
The node can synchronize using http://wiki.ros.org/message_filters#Time_Synchronizer
Fusion_detector would output the detection already fused. Please check CNN based fusion object detector( Avod, MV3D, etc). The package is just a container for the time being.
In the case when vision tracker is not used we can still track objects using the Lidar tracker.

dejanpan commented 5 years ago

test

kfunaoka commented 5 years ago

@kfunaoka what tool did you use for diagram here: autowarefoundation/autoware_ai#288 (comment).

@dejanpan I'm using http://en.plantuml.com/. I'd like to know better tool.

I'll look into your comments in this issue and #proj-architecture tomorrow.

kfunaoka commented 5 years ago

This is the latest version of the component graph for perception module. https://github.com/CPFL/Autoware/issues/1409#issuecomment-429686913

autowarefoundation / autoware_ai

[Discussion] Detection Architecture #288

New Design

General Guidelines for Architecture Design

Definitions

Guidelines

Guidelines C++