-
I have executed
```
python train_mamba_with_context.py --model state-spaces/mamba-130m \
--data_path data/Mamba-Fine-Tune/squad_train.jsonl \
--output models/mamba-130m-context \
--num…
-
### Proposal
Mamba shows "best-in-class on every single evaluation result, and generally matches baselines at twice the model size." It won't be long before we see more language models in the wild…
-
### System Info
TGI v2.2.0 with the official Docker image.
### Information
- [x] Docker
- [ ] The CLI directly
### Tasks
- [x] An officially supported command
- [ ] My own modifications
### Repr…
-
It would be great to have a general parallel prefix sum (associative scan) operation in tinygrad, something like [associative_scan](https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.associativ…
-
Hello,
I'm currently working with the `transformers` library to train a model on causal language modeling tasks using the `MambaForCausalLM` class. However, I've noticed that the typical approach t…
-
Hello! I am training the first two knowledge distillation stages of Mamba 2 on one DGX-H100x8 node, and I am experiencing train times of ~8 hours for the first stage, and ~13 hours for the second stag…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
-
Has any work been done with state space models. I'd be curious how they would perform with this framework applied.
-
My setup: Dual AMD RX 7900 XTX + ROCm 6.1.3; full setup recorded at https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu
I following the fix at https://github.com/state-spaces/mamba/issues/412 to…
-
Hello, thank you for your great work! M2bert paper mentioned that "Monarch Mixer is part of a new class of architectures called state-space models (SSMs), which include S4, Mamba, and BiGS".
Is Monar…