Parallelizing data load and training

carla-simulator / imitation-learning

Repository to store conditional imitation learning based AI that runs on CARLA.

MIT License

440 stars 124 forks source link

Parallelizing data load and training #68

Open mallela opened 5 years ago

mallela commented 5 years ago

Hello!

I read in another issue that you load data and perform training in parallel. I was just wondering how exactly you do that? Because the bottle neck does not seem to be training (takes ~0.06s) but data pre-processing/fetching call ( augmentation using imgaug Sequential process ~0.8s; loading .h5 ~0.2s). I am using a batch size of 120.

Are you using multiprocessing or the TF data input pipeline?

Thanks, Praneeta

markus-hinsche commented 5 years ago

Praneeta! In Tensorflow, the method dataset.map() has a parameter num_parallel_calls.

See how we use it in our training implementation of this paper: https://github.com/merantix/imitation-learning/blob/master/imitation/input_fn.py#L100