Open RanchiZhao opened 2 months ago
Will MLA that used in DeepSeek-V2 (https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat and https://arxiv.org/abs/2405.04434) be supported by activation smooth method?
Will MLA that used in DeepSeek-V2 (https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat and https://arxiv.org/abs/2405.04434) be supported by activation smooth method?