Closed fakerybakery closed 11 months ago
My goal is to make a ~3-4B model. Do you know if this is possible?
Unfortunately this is pretty expected. LLMs seem to be incredibly robust to duplicating layers but actually removing them tends to destroy coherence very quickly.
It's possible to prune down models like this but mergekit isn't the best tool for it. I'd recommend looking at https://github.com/princeton-nlp/LLM-Shearing and seeing if it can work for what you want.
Hope this helps!
Hmm, makes sense. Thanks for the suggestions!
Hi, I'm using this config
And when I run this code:
I get:
I feel like I'm doing something wrong here... Is this an issue w/ the tokenizer?