-
### Question
Hello, may I ask if the use_cache parameter in line 81 of llava/eval/model_vqa_science.py has been changed to False, but it still remains true during the forward inference process
-
Hi, when running example inference on Mamba2:
```
python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-spaces/mamba2-2.7b" --prompt "My cat wrote all this CUDA code for a new …
-
### Motivation.
As vllm supports more and more models and functions, they require different attention, scheduler, executor, and input output processor. . These modules are becoming increasingly com…
-
-
@51N84D can you assess the need for this? for the smaller datasets cpu will be enough, and it's fairly fast. I'm not sure for bigger datasets (e.g. LBIDD) whether it will take too much time.
-
### System Info
- `transformers` version: 4.44.2
- Platform: Linux-4.15.0-76-generic-x86_64-with-glibc2.27
- Python version: 3.12.4
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
…
-
### Model description
https://github.com/noanabeshima/tiny_model
It's a small language model trained on TinyStories for interpretability with sparse autoencoders and transcoders added. It has no…
-
### 🐛 Describe the bug
When initializing a Transformer like this
```
nn.Transformer(hidden_dim * 2, 4, batch_first=True)
```
and then calling it like this
```
attention_mask = nn.Transformer.…
-
# ❓ Questions & Help
Existing examples in session-based/sequential recommendations only use item-level, sequence-based features.
However, in many real-world scenarios, we do have access to either …
-
I ran through the conversion script for llama 13B Chat but when I run the mode on longer generations I sometimes get the following error:
`RuntimeError: probability tensor contains either `inf`, `n…