Open Baiyuetribe opened 6 days ago
How long is the text you’re generating? This seems more like the model is handling a long context length, which can use up a lot of RAM. The model’s context and max length should not exceed 4096 tokens, as that’s the maximum context length it supports.
When the generated text is long, it takes up a lot of memory