-
Hey John! Here's the curriculum that I've worked on in the past. It's a bit less focused on language models as a sole topic, and more on modern ML from a broad perspective.
- Essential Concepts of …
zmaas updated
1 month ago
-
I tried quantizing Mamba using HuggingFace/Quanto and ran into the problem of perplexity for `lambada_openai` blowing up (> 1e^37) at lower quantization levels, even though other tasks retained their …
-
Following shows the actual values of the input-dependent $\Delta$ during inference of the 2-layer network during the induction-heads task as described in the paper. I successfully trained the model t…
-
when I do evaluation for lm_harness_eval, there are some error as below. what is problem in it? and how to deal with it to make it work successfully? Thanks so much.
python evals/lm_harness_eval.py…
-
I notice that passkey retrieval works well up to around 3-4k tokens. After that, it doesn't.
That wasn't my intuition for SSMs, but I guess context length is still related to the training set? It's…
-
Dear Mamba-SSM team, congratulations on your success! Obviously many of us are excited about exploring the applications of your work.
Since there's no dropout in your model, what do you suggest f…
-
Maybe there are some benifit below:
1, The code could be simplier.
2, The inference could be faster.
3, The inference can accept multi-tokens in this way.
There are some reference codes here. htt…
-
# URL
- https://arxiv.org/abs/2404.08819
# Affiliations
- William Merrill, N/A
- Jackson Petty, N/A
- Ashish Sabharwal, N/A
# Abstract
- State-space models (SSMs) have emerged as a potential a…
-
the log as follows :
File "D:\ProgramData\Anaconda3\envs\torchseg\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^…
-
It would be worth to provide a train script, in order to train larger models (for instance 7B, 13B).