Open huguangcheng opened 1 week ago
Hi thank you! I think the mamba kernels do not yet support bidirectional modeling, so it might be difficult to train Mamba-1 with it for the encoder-decoder prefix LM.
You can try running JRT prompt using the code in the lm-eval-harness folder! https://github.com/HazyResearch/prefix-linear-attention/blob/main/lm-eval-harness/prompt_scripts/run_jrt_prompt_hf.sh
你好,我觉得您的前缀线性注意很棒!请问我该如何加入到我的模型代码中呢?我用一个mamba基线的ssm模型