Closed cinjon closed 4 years ago
(Note that this is an updated gist from #37 )
shuffle_buffer_size
can help the start-up time, though it should not affect the asymptotic throughput. Preliminary experiments on our side showed no significant change in accuracy when setting it to 200 (instead of 1000) during episodic training. It should be fine to let it at 1000 for batch training, as it is 1000 total and not per class.shuffle_buffer_size
(or if you have lots of memory), increasing DataConfig.num_prefetch
might help. You could also play with DataConfig.read_buffer_size_bytes
.Increasing the read_buffer and the num_prefetch just made it even slower. I'm also using local disk and not network, so that's not an issue.
To be clear, this isn't usable at all right now. What happens when you guys run it with the TF datareader into pytorch? What kind of speeds do you get?
For what it is worth, we are getting acceptable speeds using the data loader from PyTorch in the CNAPs project https://github.com/cambridge-mlg/cnaps. The reader code is here: https://github.com/cambridge-mlg/cnaps/blob/master/src/meta_dataset_reader.py. I don't know if it makes any difference, but we are not using tf-eager mode, just an old-fashioned session.
I did not time the loader itself recently, but a lower bound would be 1.5 episodes / second given that it is the mean time per episode of prototypical network on ImageNet (averaged over training and validation). It's probably a bit more, but not orders of magnitude.
What about you?
Maybe in your case, setting num_prefetch
to 0 and setting read_buffer_size_bytes
to something lower may help, then.
I see, ok. I managed to get this working better / sufficiently ok with the above tips and @jfb54's repo (thanks!).
What are your recommendations for speeding up the data loader? It's pretty slow out of the box [when used with pytorch]. I am using the make_multisource_episode_pipeline (gist here).