-
Follwoing the documentation using:
```bash
cd llm
make chat -j
```
I get the following error:
```bash
CUDA is unavailable!
src/GPTBigCodeGenerate.cc src/GPTBigCodeTokenizer.cc src/Generate.c…
-
### 🚀 The feature, motivation and pitch
Starting from iOS 18, Core ML has state, which is the counterpart of mutable buffer. As a result, ExecuTorch can now let Core ML handle buffer mutation
##…
-
Hello, thanks to your great work!
In `blip2_vicuna_instruct.py`, the `bos_token` of LLM has been changed. Originally, it is '< s >' with idx:1. But after the following code:
```
self.llm_tokenize…
-
### Your current environment
```text
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: RED OS release MUROM (7.3.4) Stan…
-
Here's an overview of the features we intend to work on in the near future, across Core Keras, KerasNLP, and KerasCV.
## Core Keras
### Saving & export
- [Open for Contributions] Add utility …
-
Hi
Thank you for the great work you're doing on TensorRT-LLM and the Triton backend. I have some questions on matching versions between the tensorrt-llm python package, the backend, and the NGC ima…
-
I tried to run the latest (as of today) docker image:
`docker run --gpus all --shm-size 64G -p 8001:80 ghcr.io/collabora/whisperfusion:latest`
Im getting the error `OSError: /usr/local/lib/pytho…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 11 (bullseye) (x86…
-
## 🚀 Feature
Mixtral 8x7B is a mixture-of-experts LLM that splits the parameters in 8 distinct groups an I would like to do both training and inference with Thunder.
### Work items
- [x] Run `t…
-
### Describe the issue
Hi,
I have a quick question regarding the LLM inference on CPUs using this extension.
I've been digging into the LLM inference case, and it seems like the kernels written in …