Closed thunderboom closed 4 years ago
I was trying to train mam+bert on the Fewrel dataset. it was time-consuming, so it's important to me
Yes, it is slow... MAML requires information from multiple tasks to do an update (you can also reduce the maml_batch size to 1 but that may increase training instability). Since you need to keep track of the gradients before and after the inner loop, it is slower than other methods.
Thank you for the code for this article I have a question. In the article, each episode only includes one task, but in the code, it seems to contain multiple tasks. Therefore, each epoch contains 100 episodes, and each episode contains 10 tasks. Does this lead to slow Maml training?