CMU-TBD / SocNavBench

A Grounded Simulation Testing Framework for Evaluating Social Navigation: https://arxiv.org/abs/2103.00047
MIT License
35 stars 7 forks source link

Training a new neural network #9

Closed AshwiniUthir closed 2 years ago

AshwiniUthir commented 3 years ago

I understand that SocNavBench is a testing and evaluation framework, but can it be used for training a new neural network as well? If yes, how will I use SocNavBench to train my network?

ajdroid commented 3 years ago

Can you elaborate on the design of your network? What inputs does it require, what is the output? What size is it/what kind of size of training samples are you looking for?

AshwiniUthir commented 3 years ago

@ajdroid The network has 9 convolutional layers, each with 8 (3x1) filters and a stride size of 1. LSTM layer is added to the network to keep memory of previous steps.

The input is 2 consecutive LiDAR scans. The LiDAR scan is a column vector (360x1) with each row corresponds to the distance of an obstacle in that particular direction (say, 90th row will have the distance of an obstacle, if any, in 90th degree of the robot).

The output of the network is speed and direction.

I am looking to train the network with ETH dataset. In this approach, we replace humans one at a time in a pedestrian dataset with the robot (equipped with a limited range 360 liDAR sensor) and let it observe the environment at each timestep. Then the robot should learn to mimic the human's navigation.

ajdroid commented 3 years ago

SocNavBench has the bones to support what you are suggesting here, but will require some hacking from you.

Input: Lidar scans are not directly supported via SocNavBench although RGBD images are supported. You can simulate a lidar sensor by sampling this depth image at the appropriate locations.

Output + Training with ETH: This is the part that needs the largest amount of modification. SocNavBench as designed does not replace a pedestrian with a robot, instead it places a robot in the scene with the recorded humans. Hence, you will need to build the code that will allow you to treat one recorded human at a time as the ego agent and then you can use the renderer and other capabilities of the simulator.

Are you looking to train with the entirety of the ETH dataset or just our curated episodes? It should be possible to do the former, but that will require some hacking as well since only the latter is supported.

AshwiniUthir commented 3 years ago

@ajdroid Can you provide more information about how RGBD images can be obtained as input to my navigation algorithm? My understanding is that simstate is the only way I can access the environment and simstate doesn't have any RGBD data. All the data in the simstate are bird's eye view data and not RGBD images.

ajdroid commented 3 years ago

You should be able to change the render_mode to full-render instead of the default schematic. https://github.com/CMU-TBD/SocNavBench/blob/master/params/user_params.ini Under the schematic render mode, the bird's eye view data is default