Open rasbt opened 3 weeks ago
There is a modeling_*.py file. Good luck 🙂.
There is a modeling_*.py file. Good luck 🙂.
Haha, I finally get the weights loaded but of course it's never easy ... of course it's generating gibberish
⚡ phi-3-checkpoint ~/litgpt litgpt chat --checkpoint_dir checkpoints/microsoft/Phi-3-mini-4k-instruct
Now chatting with Phi-3-mini-4k-instruct.
To exit, press 'Enter' on an empty prompt.
Seed set to 1234
>> Prompt: What do llamas eat?
>> Reply: epsonniformes }).selves }).SSIONunicívo }). EverythingFormsћassaiejalphutureievediennesenticaciónicaciónMilMinigh ninassaselvesselves exhaustselvesonnselvesktionΗracheracheionedΗ Avenoted Bij_+versionsmastevosepsselvesmobileselvesilleryassaucealphasseestoreselvesférFormsiej Mu Kaiser oppienngnatteversionsionedionedversionsSSIONectionaccoossFormassaselves_+uminatesonoSSIONológissancecenteecause_+ienn选uraleʋ Stepalphigosionaliilonverte }).ienn }).ativo Sternsonoiejuralassawnkademselves│uraleativaionedvos_+utschversionsponiej_+icacióniejiewerológvoasonverte shoutioned位ionedIdentmobi
Let the easter egg hunt begin 😭
Some more tidbits via Daniel Han:
Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to @UnslothAI :
Ok, it's becoming more interesting. Somewhat I expected from LlaMA 3, but it didn't deliver.
Looks like the sliding window number was a typo: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/commit/b043e05a86cfc77f8d53eb0edf6a33e39afbcb5e
Current code is an ugly state, but at least the model produces the same output as HF one.
The most notable change is that now Phi3 model doesn't use parallel_residual
in contrast to Phi1.5 and Phi2.
The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens. If I'm not mistaken, the current code doesn't add these tokens.
The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens. If I'm not mistaken, the current code doesn't add these tokens.
Yeah, that sounds about right based on the Phi-3 paper:
To best benefit the open source community, phi-3-mini is built upon a similar block structure as Llama-2 [TLI+23] and uses the same tokenizer with vocabulary size of 320641