google-research / meta-dataset

A dataset of datasets for learning to learn from few examples
Apache License 2.0
761 stars 139 forks source link

Speed of data loader? #40

Closed cinjon closed 4 years ago

cinjon commented 4 years ago

What are your recommendations for speeding up the data loader? It's pretty slow out of the box [when used with pytorch]. I am using the make_multisource_episode_pipeline (gist here).

cinjon commented 4 years ago

(Note that this is an updated gist from #37 )

lamblin commented 4 years ago
cinjon commented 4 years ago

Increasing the read_buffer and the num_prefetch just made it even slower. I'm also using local disk and not network, so that's not an issue.

To be clear, this isn't usable at all right now. What happens when you guys run it with the TF datareader into pytorch? What kind of speeds do you get?

jfb54 commented 4 years ago

For what it is worth, we are getting acceptable speeds using the data loader from PyTorch in the CNAPs project https://github.com/cambridge-mlg/cnaps. The reader code is here: https://github.com/cambridge-mlg/cnaps/blob/master/src/meta_dataset_reader.py. I don't know if it makes any difference, but we are not using tf-eager mode, just an old-fashioned session.

lamblin commented 4 years ago

I did not time the loader itself recently, but a lower bound would be 1.5 episodes / second given that it is the mean time per episode of prototypical network on ImageNet (averaged over training and validation). It's probably a bit more, but not orders of magnitude. What about you? Maybe in your case, setting num_prefetch to 0 and setting read_buffer_size_bytes to something lower may help, then.

cinjon commented 4 years ago

I see, ok. I managed to get this working better / sufficiently ok with the above tips and @jfb54's repo (thanks!).