Integrating Pylot in an RL training process

erdos-project / pylot

Modular autonomous driving platform running on the CARLA simulator and real-world vehicles.

Apache License 2.0

453 stars 126 forks source link

Hi! I want to train NPC vehicles in an autonomous driving scenario, using RL algorithms to discover the potential weaknesses of the Ego vehicle. I'm currently trying to use Pylot as the "system under test" to control the Ego and integrate it into the entire RL training process. I have some questions about Pylot's "reusability":

As an initial test, I'm currently restarting the Pylot process for each episode in the RL training, by shutting down the process and then restarting it, allowing Pylot to reconnect to the entire workflow. This is quite inefficient, as timing statistics show that a complete restart of Pylot takes an average of about 20 seconds.
In reinforcement learning, the reset of each episode is achieved by moving the vehicle's position (set_transform). If Pylot is not restarted, it will be "at a loss" after moving the position. I understand that this might be due to an error in its route planning. Is there a way to make a specific module in Pylot work again independently?
Since we also record necessary information for the Ego vehicle in the reinforcement learning environment, I think theoretically Pylot does not need to perform redundant perception tasks. If we only target the simulator environment, what kind of data content and format does Pylot require to complete path planning?

Hi, I've looked modifying Pylot to do RL work in the past, and I'm happy to share my findings. Pylot wasn't designed with RL in mind, so some of these changes might be complex:

Pylot was designed to run as a real-world AV pipeline. To provide real-time execution, components execute in separate processes to exploit parallelism for real-time execution (Python offers limited intra-process parallelism due to the global interpreter lock). As such, starting up this collection processes takes some time. Further slowdowns occur during setup because some components initialize complex libraries like Tesnorflow.
It might be possible to "soft-reset" Pylot, but this would require modifications to Pylot's operators. This would require some mechanism to notify the operators to reset their state. Also, the planning operator would need to change its destination waypoint.
You can execute Pylot without perception using ground-truth information extracted from the simulator by setting the following flags:
```
--simulator_obstacle_detection
--simulator_traffic_light_detection
--perfect_obstacle_tracking
--perfect_localization
```

erdos-project / pylot

Integrating Pylot in an RL training process #293