mamba-state-space-models Search Results

165 results
for mamba-state-space-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

sterrettJD/gpLM-reading-group #3

some curriculum suggestions

Hey John! Here's the curriculum that I've worked on in the past. It's a bit less focused on language models as a sole topic, and more on modern ML from a broad perspective. - Essential Concepts of …

zmaas updated 1 month ago
3
state-spaces/mamba #311

Likely bug in perplexity calculation

I tried quantizing Mamba using HuggingFace/Quanto and ran into the problem of perplexity for `lambada_openai` blowing up (> 1e^37) at lower quantization levels, even though other tasks retained their …

qwoprocks updated 5 months ago
3
state-spaces/mamba #62

Visualization of Delta (post-Softplus) values during the Ind…

Following shows the actual values of the input-dependent $\Delta$ during inference of the 2-layer network during the induction-heads task as described in the paper. I successfully trained the model t…

hrbigelow updated 2 months ago
7
state-spaces/mamba #299

lm_harness_eval evaluation error while running

when I do evaluation for lm_harness_eval, there are some error as below. what is problem in it? and how to deal with it to make it work successfully? Thanks so much. python evals/lm_harness_eval.py…

lijiek updated 4 months ago
1
state-spaces/mamba #140

Is Context Length dependent on training data's context?

I notice that passkey retrieval works well up to around 3-4k tokens. After that, it doesn't. That wasn't my intuition for SSMs, but I guess context length is still related to the training set? It's…

RonanKMcGovern updated 1 month ago
6
state-spaces/mamba #88

Any suggestions for regularization?

Dear Mamba-SSM team, congratulations on your success! Obviously many of us are excited about exploring the applications of your work. Since there's no dropout in your model, what do you suggest f…

drscotthawley updated 3 months ago
8
state-spaces/mamba #233

I guess mamba.step could be deleted if selective_scan_fn can…

Maybe there are some benifit below: 1, The code could be simplier. 2, The inference could be faster. 3, The inference can accept multi-tokens in this way. There are some reference codes here. htt…

agiwave updated 2 months ago
4
AkihikoWatanabe/paper_notes #1361

The Illusion of State in State-Space Models, William Merrill…

# URL - https://arxiv.org/abs/2404.08819 # Affiliations - William Merrill, N/A - Jackson Petty, N/A - Ashish Sabharwal, N/A # Abstract - State-space models (SSMs) have emerged as a potential a…

AkihikoWatanabe updated 3 days ago
1
JCruan519/VM-UNet #1

method 'selective_scan_fn' is not found

the log as follows : File "D:\ProgramData\Anaconda3\envs\torchseg\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^…

chankeh updated 1 week ago
26
state-spaces/mamba #1

Training Script

It would be worth to provide a train script, in order to train larger models (for instance 7B, 13B).

loretoparisi updated 9 months ago
7

上一页 1...1 2 3 4 5 6 7...17 下一页

165 results for mamba-state-space-models

165 results
for mamba-state-space-models