Closed JanPokorny closed 3 years ago
Thanks for trying the code out! There was a problem with the way the configs were set up which I believe is now fixed (at least, it works for me). Google can be finicky about the amount of compute it gives you, so try fiddling with the settings in "Modify config for colab" or try the 1.3B model instead of the 2.7B model if you're still getting OOMs
Hi @StellaAthena, thank for sharing your great work! I trained GPT3_XL with TPU Colab with my custom dataset. It was ok, but in inference, I also had the same problem like @JanPokorny's error. I think that training is always costs more resources than inference. Why inference caused OOM while training did not cause. Any advices!
@StellaAthena Inference in the new notebook works for me, thanks!
@LeCongThuong Try starting with a fresh notebook, the updated version worked for me as-is -- I just had to do the google auth and enter my bucket URL, then I was able to download and infer from GPT3_XL.
Thanks @JanPokorny, I tried to restart runtime Colab but it did not work. Next, I tried to update Colab to Colab pro and it worked. So as @StellaAthena said, the problem lie on resources Google gives us, not the code repositories.
@LeCongThuong I was confused since I was able to train, but not to infer, so I suspected that the OOM was caused by a different underlying cause. (Also the RAM bar didn't show usage in the Colab UI. Which it apparently doesn't for TPUs.) With the updated notebook I'm able to fine-tune GPT3_XL and infer from the fine-tuned model even on the free tier.
In the provided Colab (only using provided cells), after downloading a pre-trained GPT3_XL, I tried to infer from it, which resulted in the following output from the very last cell:
out.txt
The interesting part seems to be:
...followed by many more similar OOM errors.
I'd be glad for any help with running the inference in Google Colab. Training actually seems to work and saves a new checkpoint, but I have not been able to run inference even on the provided pre-trained network.