-
### Your current environment
Collecting environment information...
INFO 08-28 14:32:56 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 08-28 14:3…
-
### What happened?
The generation speed of llama-server has significantly decreased since b3681, and this issue persists in the latest b3779 without improvement.
For the same task and parameters "-n…
-
### 🐛 Describe the bug
When a module has a parameter which is a tensor of size 1 and you try to save its FSDP with torch.distributed.checkpoint, you get the following exception:
```
NotImplemente…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### What happened?
imatrix creation and subsequent quantization to IQ3_XXS of [mixtral 8x7b instruct](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/blob/main/mixtral-8x7b-instruct-v…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
## 代码:
代码来源于
> blis https://github.com/flame/how-to-optimize-gemm/wiki#step-by-step-optimizations
中 optimize 07
> https://github.com/flame/how-to-optimize-gemm/wiki/Optimization_1x4_7
可以cd 到 …
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### Your current environment
```text
The output of `python collect_env.py`
```
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: …
-
### What is the issue?
Using `ollama:latest` with nvidia-docker and 2x4090.
Tried blasting a bunch of 256 words long text snippets to ollama for embeddings generation using `all-minilm:l6-v2`.
…