Closed gramesh-amd closed 2 weeks ago
Thank you @gramesh-amd. It might be an permission issue.
Have you tried to download the tokenizer file to your own bucket and change the path tokenizer_path
accordingly?
In parallel, we are looking at bucket permission gs://mlperf-llm-public2/
.
Thanks @ZhiyuLi-goog
Yes I have downloaded the tokenizer from S3 bucket that was provided here and the name seems to match what I see in google's mlperf submission scripts. I use my local path instead of s3 bucket for tokenizer_path. The logs show that its able to load the tokenizer correctly
But I get the above error (id for <s>
is not defined), so im not sure if its the right tokenizer
Hi, @gramesh-amd
We haven't seen this error before.
Do you have a service account in your project, and we can grant you the access to the original bucket gs://mlperf-llm-public2/
.
gowtham.ramesh@amd.com is my email (ive also created a google account with this same address)
We should have granted you the access. Could you take another try?
Thanks, will check
it works after i download from the gs://mlperf-llm-public2/ bucket
Thank you
Hello, I have followed the instructions in here to download paxml weights of gpt3 and its tokenizer (vocab folder) and tried using it in tokenizer_path like this. But it results in the following error:
So is the path in s3 the right path to the tokenizer?