bminixhofer / zett

Code for Zero-Shot Tokenizer Transfer
https://arxiv.org/abs/2405.07883
101 stars 7 forks source link

Llama3 #1

Open zf0x00 opened 2 months ago

zf0x00 commented 2 months ago

Normal Llama 3 can work or need to train hypernetwork

bminixhofer commented 2 months ago

I am in the process of training a hypernetwork for Llama3!

zf0x00 commented 2 months ago

nice ❤️ also can share info about training how much time it takes and i tried to train but most notebook doesn't support python 3.11

bminixhofer commented 2 months ago

Here is the first version of a Llama3 hypernet: benjamin/zett-hypernetwork-Meta-Llama-3-8B-experimental.

It seems to underperform on Code though. I haven't yet found the reason why but will look into this later, so keeping this open.

Training took ~4 days on a TPUv4-32 pod.