mamba-state-space-models Search Results

165 results
for mamba-state-space-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Oxen-AI/mamba-dive #2

zero loss when training

I have executed ``` python train_mamba_with_context.py --model state-spaces/mamba-130m \ --data_path data/Mamba-Fine-Tune/squad_train.jsonl \ --output models/mamba-130m-context \ --num…

jcrangel updated 7 months ago
1
TransformerLensOrg/TransformerLens #462

[Proposal] Add support for Mamba

### Proposal Mamba shows "best-in-class on every single evaluation result, and generally matches baselines at twice the model size." It won't be long before we see more language models in the wild…

joker3212 updated 10 months ago
5
huggingface/text-generation-inference #2334

Newer HF Mamba model is not supported

### System Info TGI v2.2.0 with the official Docker image. ### Information - [x] Docker - [ ] The CLI directly ### Tasks - [x] An officially supported command - [ ] My own modifications ### Repr…

jonnyli1125 updated 2 months ago
1
tinygrad/tinygrad #3039

Bounty: Fast parallel scan (Mamba, etc).

It would be great to have a general parallel prefix sum (associative scan) operation in tinygrad, something like [associative_scan](https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.associativ…

Algomancer updated 2 months ago
1
state-spaces/mamba #294

Can `MambaForCausalLM` be used directly for training instead…

Hello, I'm currently working with the `transformers` library to train a model on causal language modeling tasks using the `MambaForCausalLM` class. However, I've noticed that the typical approach t…

LumenScopeAI updated 5 months ago
2
jxiw/MambaInLlama #9

Training Slowdown for Llama3-Mamba2

Hello! I am training the first two knowledge distillation stages of Mamba 2 on one DGX-H100x8 node, and I am experiencing train times of ~8 hours for the first stage, and ~13 hours for the second stag…

Codys12 updated 3 weeks ago
13
axolotl-ai-cloud/axolotl #1325

Mamba example config fails on latest docker

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. …

Layoric updated 5 months ago
5
KimMeen/Time-LLM #61

Mamba or Jamba models

Has any work been done with state space models. I'd be curious how they would perform with this framework applied.

DewEfresh updated 5 months ago
1
state-spaces/mamba #429

AMD GPU: AttributeError: 'HIPDriver' object has no attribute…

My setup: Dual AMD RX 7900 XTX + ROCm 6.1.3; full setup recorded at https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu I following the fix at https://github.com/state-spaces/mamba/issues/412 to…

eliranwong updated 3 months ago
2
HazyResearch/m2 #34

What category does the M2 model belong to

Hello, thank you for your great work! M2bert paper mentioned that "Monarch Mixer is part of a new class of architectures called state-space models (SSMs), which include S4, Mamba, and BiGS". Is Monar…

41924076 updated 4 months ago
2

上一页 1...1 2 3 4 5 6 7...17 下一页

165 results for mamba-state-space-models

165 results
for mamba-state-space-models