Drivable surface segmentation

Nova-UTD / navigator

Navigator, our self-driving vehicle software stack

https://nova-utd.github.io/navigator

Other

32 stars 11 forks source link

Drivable surface segmentation #372

Open danielv012 opened 1 year ago

jruths commented 3 months ago

Currently the costmap for driveable surface is provided by the map manager. While this is a reasonable place to get a driveable surface map, it tends to be slow and creates some significant lag on the realtime aggregated costmap. It also puts possibly unnecessary reliance on the map. The goal here is to use camera and possibly lidar to identify the surfaces around the vehicle that are driveable (road) so it can be used to create a costmap for road/not-road.

jruths commented 3 months ago

There currently exists a image_segmentation_node.py that may do some of this or provide a starting point? I don't think we currently use this node, so it is untested (or at least hasn't been tested in the past year). If it works and one of the "classes" it produces is driveable road, then creating the costmap would largely be about mapping the image into a costmap (projecting the image into 3D space... which is related to another task: #370 ).

This node is located at src/perception/segmentation/image_segmentation_node.py

saishravanm commented 2 months ago

09/09/2024 Update:

-Status: Researching how to use a Voxel Grid to convert the 2D occupancy grid (returned by the existing image_segmentation_node) into a 3D costmap. -Next Steps: Work on creating a Voxel Grid. -Projected Completion: 9/16/24 -Update: Commented and understood the existing image_segmentation_node and image_projection_node.

saishravanm commented 2 months ago

09/16/2024 Update:

Status: Doing a deep dive into MMS documentation for the best methods of 2D image-segmentation
Next Steps: Get a demo of CARLA working (the car moving along the city) and make any dataset changes(or create a new one entirely) for the node.
Projected Completion: 10/4/24
Update:
-Last week, David and I separated the objectives for the image_segmentation_node and the image_projection_node. I will be working on the 2D image segmentation portion which will return a 2D occupancy grid and be fed into the image_projection (2D to 3D) service which David will be working on. -I was able to get a prebuilt demo of the MMSegmentation framework (based on PyTorch) working (had some small issues with matching versions). Using that I was able to better understand the current image_segmentation_node and the important parts of how 2D image segmentation works. The existing node currently uses a pre-trained dataset. Still need to test it out with CARLA to see if it is sufficient, if not, must be tweaked.
Resources Used: -MMSegmentation Documentation: https://github.com/open-mmlab/mmsegmentation/tree/main -Fixed MMSegmentation Demo (with the correct versions): https://colab.research.google.com/drive/1l0fpP8NCxlU8QAzD4KfR3dmyzlfAgEEC?usp=sharing

saishravanm commented 2 months ago

09/23/2024 Update:

Status: Rewriting existing image_segmentation node with the latest versions of MMSegmentation, as well as rewriting manual_control node with Pygame instead of ncurses.
Next Steps: Test current image segmentation logic in CARLA, using the keyboard controls
Projected Completion: 10/6/24
Update:
-My current approach in terms of finishing this drivable surfaces task as a whole, is to work off of the current image_segmentation node and edit the model as necessary. In order to do that, I need to get a working keyboard controller for CARLA up and running so I can test how well the current code works in the simulation. -I spent most of my time trying to work through the CARLA Launch file environment and fixing package related bugs in both of these nodes. (The manual_control node wouldn't launch at all because the word "pedal" was misspelled throughout the entire file) -The manual_control node provides a terminal based GUI(using ncurses) for vehicle control, however this requires a separate terminal just for controlling. I'm currently working on utilizing pygame instead, which should allow for vehicle control in the same terminal. After that, I plan on getting a GUI panel in CARLA to directly interface with the controls. -The image_segmentation node uses an older version of MMSegmentation, which uses a slightly different approach to how functions are called (especially how pre-existing models are used). As a result, the node doesn't work as of now. I'm still going through the documentation and making the necessary fixes so I don't have to rewrite all the logic.

saishravanm commented 1 month ago

09/30/2024 Update:

Status: Testing image segmentation model parameters using the latest MMSegmentation tools.
Next Steps: Fine-tune the model and integrate it as a viewable "layer" in CARLA. Resolve issues with the keyboard_control node logic.
Projected Completion: 10/12/24
Progress Update:
- I successfully began implementing a new keyboard_controller node in CARLA using Pygame. However, due to multiple existing nodes (specifically the routing nodes) accessing the vehicle's position topics, I’m facing issues getting the vehicle to move properly. This requires further troubleshooting.
- For image segmentation, I’ve updated the code to the latest OpenMMS version and successfully ran the node in CARLA without errors. However, debugging is still needed as the node currently doesn’t output easily interpretable results. I'm now working with the SegLocalVisualizer tool to test and adjust the current coloring scheme on test images.

saishravanm commented 3 weeks ago

11/04/2024 Update:

Status: Tweaking Image Segmentation model parameters.
Next Steps: Calculate free space confidence and generate 2D occupancy grid.
Projected Completion: Complete PR by 11/10/24
Progress Update:
- On Friday (11/1/24) I was able to complete the keyboard_controller node which allows for Hero to be controlled via WASD and X in CARLA. It's sort of sensitive to input (takes exponential time to reach top throttle) so ensure that you press and hold the keys as seen fit. The node is in the "keyboard_controller" branch.
- For image segmentation, I decided to completely start afresh and rewrite everything in terms of Facebook's SAM 1.0 (Segment Anything Model) and was able to get a demo of it in action in CARLA (using aforementioned keyboard_controller node). It took a while to get used to but I think SAM is the most optimal way to go because of how easy it is to implement (there is a 2.0 that exists but I didn't see beforehand but shouldn't be too hard to port it over).
- 45s Demonstration Video:
- I did some research on how to generate the 2D occupancy grid after completing the segmentation stuff. I will be using this tutorial: https://www.mathworks.com/help/driving/ug/create-occupancy-grid-using-monocular-camera-sensor.html.
- Since SAM returns the segmented image as a "mask" (like a layer) on top of the original image (rather than a new colorized image), calculating the confidence levels should be quite trivial.

saishravanm commented 2 weeks ago

11/11/2024 Update:

Status: Researching how to convert segmented 2D image into occupancy grid
Next Steps: Utilize RTAB-MAP to create either a stereo map with images, or occupancy grid with segmented information laid on top
Projected Completion: Complete functionality by 11/18
Progress Update:
- I was able to resolve the "issues" in the keyboard_controller_node and updated the PQ.
- I ported the current image_segmentation_node from SAM 1.0 to SAM 2.0 as the latter makes it easier to gather information about the segmented portions of an image.
- I am currently researching the best way to create the 2D occupancy grid using the segmented information and so far the best option is to utilize RTAB-MAP (an RGB-D, Stereo and Lidar Graph-Based SLAM approach based on an incremental appearance-based loop closure detector). The issue is that currently, the segmentation and projection functionality are split between my node and David's node, whereas RTAB-MAP is a ROS2 node on its own which combines both functionalities.
- RTAB-MAP Demo videos: https://www.youtube.com/watch?v=qpTS7kg9J3A
- https://github.com/introlab/rtabmap/wiki/Stereo-mapping