-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N…
-
Hi, I am having problems with memory allocation warnings (that lead to crashes) when using LlamaCppEmbeddings on an M1 Mac. I am running llama-cpp-python v0.1.84 on a MacBook Pro with 16GB of RAM, wh…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: C…
-
### What happened?
Hi,
When I use llama.cpp to deploy a pruned llama3.1-8b model, a unbearable performance degration appears:
We useing a structed pruning method(LLM-Pruner) to prune llama3.1-8b, w…
-
### What happened?
```
INFO [ main] build info | tid="255085751848992" timestamp=1726024154 build=3726 commit="b34e0234"
INFO [ main] system info | tid="255085…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
Hello, I met the following error when running `blend.py`
I changed the model to `Llama-3-8B-Instruct` since I have no access to mixtral models. Will that cause error?
Log:
```
$ python example…
-
Release date: Aug 8 2024
Branch cut: Aug 2 2024
## [Developer Facing API](https://github.com/pytorch/ao/issues/391)
- [x] static quantization flow example @jerryzh168
- [ ] QAT refactor to gener…
-
### Your current environment
```text
PyTorch version: 2.3.0a0+ebedce2
Is debug build: False
CUDA used to build PyTorch: 12.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
…
-
I followed the instructions from https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html on a bare metal server from the Intel Dev Cloud, specifically this instance:
…