Speed of data loader? - Githubissues

google-research / meta-dataset

A dataset of datasets for learning to learn from few examples

Apache License 2.0

761 stars 139 forks source link

Speed of data loader? #40

Closed cinjon closed 4 years ago

cinjon commented 4 years ago

What are your recommendations for speeding up the data loader? It's pretty slow out of the box [when used with pytorch]. I am using the make_multisource_episode_pipeline (gist here).

cinjon commented 4 years ago

(Note that this is an updated gist from #37 )

lamblin commented 4 years ago

If possible, loading the data from a local disk (SSD if possible), rather than from network, can make a big difference. That was the most effective improvement in my experience.
Reducing the shuffle_buffer_size can help the start-up time, though it should not affect the asymptotic throughput. Preliminary experiments on our side showed no significant change in accuracy when setting it to 200 (instead of 1000) during episodic training. It should be fine to let it at 1000 for batch training, as it is 1000 total and not per class.
With a reduced shuffle_buffer_size (or if you have lots of memory), increasing DataConfig.num_prefetch might help. You could also play with DataConfig.read_buffer_size_bytes.

cinjon commented 4 years ago

Increasing the read_buffer and the num_prefetch just made it even slower. I'm also using local disk and not network, so that's not an issue.

To be clear, this isn't usable at all right now. What happens when you guys run it with the TF datareader into pytorch? What kind of speeds do you get?

jfb54 commented 4 years ago

For what it is worth, we are getting acceptable speeds using the data loader from PyTorch in the CNAPs project https://github.com/cambridge-mlg/cnaps. The reader code is here: https://github.com/cambridge-mlg/cnaps/blob/master/src/meta_dataset_reader.py. I don't know if it makes any difference, but we are not using tf-eager mode, just an old-fashioned session.

lamblin commented 4 years ago

I did not time the loader itself recently, but a lower bound would be 1.5 episodes / second given that it is the mean time per episode of prototypical network on ImageNet (averaged over training and validation). It's probably a bit more, but not orders of magnitude. What about you? Maybe in your case, setting num_prefetch to 0 and setting read_buffer_size_bytes to something lower may help, then.

cinjon commented 4 years ago

I see, ok. I managed to get this working better / sufficiently ok with the above tips and @jfb54's repo (thanks!).