Closed bigattichouse closed 4 months ago
Note: please don't break this with a fix, unless you allow it to be "unfixed" with a commandline option. I think this could possibly be very important to RAG-style self-verification of output, if we can figure how the mappings work.
That one is normal, and you can do that even outside llama-zip: basically, using something like llama.cpp, just use whitespace or an empty string as prompt, and using a fixed seed, I found that on some version of llama.cpp, you could just have the model output part of it's training data verbatim.
You don't ever encounter an EOS token, so it goes on endlessly (or until you don't have enough ram/vram because of the ctx size)
No worries, I have no plans to attempt to "fix" this. I see this behavior as a necessary consequence of the way compression works with an arithmetic coder. An input like "a" contains very little entropy and does not guide the LLM much, so generations like this are to be expected.
As for finding the "key" to unlock an article, well, that's precisely what the compressor does in llama-zip
! It effectively finds the shortest key that generates the given input. Compress a Wikipedia article with llama-zip
and you'll get the key you're looking for.
I'm an idiot. it was hallucinating an article... but I suppose you could key to an existing article in training:
After training, with access to the source data, you could create summaries and embeds from the source articles, and attach them to the zip key. This would allow you to look the original article back up based on context in the current ocnversation. not exactly ideal, but the idea would hold.
So, now for experimentation: Do articles that were likely used in training have shorter keys than arbitrary text?
This isn't necessarily a problem, but might prove an interesting way to have the model dump/recover large portions of its training data, Putting in arbitrary values:
> llama-zip ../../gguf/Meta-Llama-3-8B-Instruct-Q8_0.gguf -d "a"
(also by removing the first letter) seems to allow you to pull entire training datasets out of LLMs. It will just repeat the entire source article.
I see this as a feature. It would be great to try and figure out what the "key" is to unlock an article, as it would be an amazing "wikipedia on disk" sort of thing... or even a modified version of RAG without actually requiring an external database.
How do I get back from Sam Smith to 'a' and query it? That could make hallucinations a thing of the past, if the model could recover entire memories within itself.
See the output below: