-
### Your current environment
vllm 0.6.3
### Model Input Dumps
The input is long context with over 8k tokens
### 🐛 Describe the bug
1. vllm 0.6.2 does not have this bug.
2. We a…
-
### 🐛 Describe the bug
This might be ok-ish if all code is compiled, but for eager code in the graph module this may break. Similar to https://github.com/pytorch/pytorch/issues/112072
https://gi…
-
Implement eager caching
-
When running test_attn_matmul on BH there are 166 failed tests and 12 passing tests. The failing tests need to be fixed.
-
The following E2E tests fail:
* `GPTJForCausalLM`
* `GPTJForQuestionAnswering`
for at least the following scenarios:
* E2E accuracy huggingface, training, float32, LTS, PyTorch 2.5
* E2E accu…
-
**Describe the bug**
WH transpose fails if W is an unaligned value such as 5.
**To Reproduce**
Can be trivially reproduced by adding `[[1, 1024, 5, 1280]], # Non page-aligned` to shape_wh in `tes…
-
### Describe the bug
Currently, eager workflows always invoke the latest version of whatever task or workflow is being executed. This presents a number of problems:
* When multiple developers are it…
-
### Describe the bug
I am trying to upgrade from @callstack/repack v4 to v5.0.0-alpha.0.
I have followed the templates in the repack example as well as the official webpack -> rspack migration docum…
-
Can there be a patch to support eager loading via .includes and .references so that N+1 conditions don't occur as long as the data was eager loaded?
-
### Overview
On app boot, we currently initiate eager load of tx history. Depending on existing cache this can be:
- In the worst case: Full tx history paging for all accounts on all chains
- In …