Closed HackGiter closed 7 months ago
It depends on the amount of data used and the hidden_size of the target LLM. Using ShareGPT (about 68k conversations), a model with hidden_size=4096 requires 740GB of disk space. The ablation experiment can be found in our paper.
Thanks for your great job. I just have some curiosities about it.
First, would it be too big to store the hidden states directly as .ckpt file? How large it is. Second, could you provide ablation experiment about the cnet and decoding methods?