Training is very slow for MAML and MAML++?

AntreasAntoniou / HowToTrainYourMAMLPytorch

The original code for the paper "How to train your MAML" along with a replication of the original "Model Agnostic Meta Learning" (MAML) paper in Pytorch.

https://arxiv.org/abs/1810.09502

Other

773 stars 137 forks source link

Training is very slow for MAML and MAML++? #5

Closed aniruddhraghu closed 5 years ago

aniruddhraghu commented 5 years ago

Hi,

Firstly, thanks a lot for open-sourcing the code -- great to have a resource for this!

I seem to be having an issue regarding training time for these models. I'm training a 5-way 1-shot system on Omniglot using the supplied code. When comparing with the original MAML implementation for this task (https://github.com/cbfinn/maml), I find that this code trains a lot slower (validation accuracy takes many more epochs and wall-clock time before it reaches a comparable level). Would you know why this might be? I have also found that training a MAML++ system on this task (using the code/config provided) is very slow.

For reference, both setups were run on the same machine with a GPU, and I changed a couple of parameters in the configuration (learning rate and batch size) for the MAML run to match what was used in the original paper.

Thanks for the help!

AntreasAntoniou commented 5 years ago

Hello there,

A bug was introduced 3 days ago, which made things converge much slower than usual. Can you retry the experiments with the current state of the code? Thank you.

Regards, Antreas

aniruddhraghu commented 5 years ago

Thanks -- I will try again and let you know.

aniruddhraghu commented 5 years ago

I think it now trains faster (in terms of getting better performance with fewer train steps) but it still runs slowly (that is, each training step takes longer than it did with the the original MAML code). Is this something you also experienced?

AntreasAntoniou commented 5 years ago

Not really. I find my code to be at least 2x faster than the original MAML code. What's your setup? If you want to receive some speed gains for early epochs you can change the config variable

"first_order_to_second_order_epoch":-1,

To something other than -1, then for the number of specified epochs it'll use first order approximations.

As far as the MAML speed comparative to the original. Honestly, with my setup, this is faster. It might just be a matter of setup? Do you use CUDNN?

aniruddhraghu commented 5 years ago

I'm on Ubuntu 16.04, using an NVIDIA GeForce GTX 1060 graphics card. I am also using CUDNN, yes.

I'm getting something like 1.2 seconds/iteration using your code, and I think more like 0.4 seconds/iteration using the original MAML code. Turning off logging with tqdm gets some speedup, but it's not very significant.

aniruddhraghu commented 5 years ago

I did some more analysis and I'm also observing this issue on a GPU server with a Titan X graphics card; this is when comparing this implementation of MAML with the original.

AntreasAntoniou commented 5 years ago

Are you using the same task batch sizes for both? I will investigate this myself. I just don't have the time right now. I can tell you, however, that the speed of the two systems in late June was faster in my implementation. I need time to investigate so let's leave this open until then. Are you using an SSD disk?

aniruddhraghu commented 5 years ago

I am, yes -- thanks for taking a look! I am also using an SSD disk.

AntreasAntoniou commented 5 years ago

Can you try another run now? It should be much faster.

aniruddhraghu commented 5 years ago

I think it's sorted -- will close the issue. Thanks for the help!