-
### Report of performance regression
I found the attention (flashattn.py) computation time increased 1.7x after upgrade vllm 0.6.0 to 0.6.3.
| | v0.6.0 | v0.6.3 |
| :----: | :----: | :----: |
…
-
Multiplying a `CuSparseMatrixCSC` and a `CuSparseVector` vector returns a `CuArray`, instead of a `CuSparseVector`.
```
using CUDA
using CUDA.CUSPARSE
using Random
using SparseArrays
Random.see…
-
I had an issue in one of the services I work on, where it would use more and more memory until crashing. After some digging around I was able to reduce it to the following script:
```python
import a…
-
hello, i have tried to use megablocks in V100 + pytorch2.4.0+cu121, but get error with "cannot support bf16". If i use megablocks in fp32, i get error "group gemm must use bf16". So i change my enviro…
-
### 🐛 Describe the bug
When i execute `tune run lora_finetune_single_device --config xxx.yaml`, with the `yaml` file is:
```
# Logging
output_dir: finetune/model-dir/Qwen2.5-0.5B-Instruct-finetu…
-
**Description**
A clear and concise description of what the bug is.
I am trying to use the newly introduced [triton inference server In-Process python API](https://github.com/triton-inference-server…
-
Hi, when I use `pip install git+https://github.com/tatsy/torchmcubes.git` to install, it will cause:
I am using Python 3.12.2 and CUDA 12.2.
```bash
pip install git+https://github.com/tatsy/tor…
-
## Bug Report
@trilinos/stokhos @etphipp
### Description
The `Stokhos_TpetraCrsMatrixUQPCEUnitTest_Cuda_MPI_4` unit test fails in cuda/11.4.2 builds with the following output:
```
...
317: Cu…
-
Requirement already satisfied: packaging in ./venv/lib/python3.11/site-packages (from -r requirements.txt (line 1)) (21.3)
Collecting torch==2.1.0 (from -r requirements.txt (line 2))
Using cached …
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTor…