Llama3 - Githubissues

bminixhofer / zett

Code for Zero-Shot Tokenizer Transfer

https://arxiv.org/abs/2405.07883

101 stars 7 forks source link

Open zf0x00 opened 2 months ago

zf0x00 commented 2 months ago

Normal Llama 3 can work or need to train hypernetwork

bminixhofer commented 2 months ago

I am in the process of training a hypernetwork for Llama3!

zf0x00 commented 2 months ago

nice ❤️ also can share info about training how much time it takes and i tried to train but most notebook doesn't support python 3.11

bminixhofer commented 2 months ago

It seems to underperform on Code though. I haven't yet found the reason why but will look into this later, so keeping this open.

Training took ~4 days on a TPUv4-32 pod.