Open jayendra13 opened 3 years ago
Try passing the tpu name, zone, and project directly into the MtfModel initialization arguments rather than using cluster resolver to get the address.
I have tried that but it doesn't make any difference.
Is your TPU in running status? Was it pre-empted or shut down? Are your gcloud credentials set up? If you are passing in those arguments correctly, there's not a lot I can do to debug - we use this codepath all the time on Cloud without issue.
I have created a finetuneing script from the t5-trivia notebook . The script mentioned above works if I run the code from the ipython, but hangs if I run the code from the commandline.
No log gets printed after the following in the command-line mode, i.e.
python t5test.py
, whereas while copy-pasting the same code to ipython runs the code successfully.Here is the version info