Open sdmorrey opened 3 months ago
Hello @sdmorrey,
you should check llama2-tasks.cpp and grok1-tasks.cpp files. For different architectures DL builds a different task list. Tasks are reused of course (in grok1-tasks.cpp you can see the implementation of different tasks than Llama model uses).
I see Gemma 2 has more norm layers. Rope layer it seems it's already implemented (FalconRopeCommand
). Probably the tokenizer is something that may need more work (converter), but I'm not sure.
+1 for Gemma 2
What would be required to support Gemma 2? I'd be happy to chip in and help with the code, I just need to have a bit of insight into what would need to be changed?