-
## 🐛 Bug
(base) C:\Users\dmsha\dev\mlc>python -m mlc_llm.build --model Llama-2-7b-chat-hf --target vulkan --quantiz
ation q4f16_1 --llvm-mingw path/to/llvm-mingw
** Compiling models under Windows…
-
## 🐛 Bug
Error encountered in latest when compiling android model of Llama-2-7b-chat-hf on Windows.
python -m mlc_llm.build --target android --max-seq-len 768 --model dist/models/Llama-2-7b-cha…
-
### Your current environment
```text
root@0fca177ad2d4:/workspace# python3 collect_env.py
Collecting environment information...
PyTorch version: 2.1.2
Is debug build: False
CUDA used to build…
-
I notice that you use jax.pure_callback to call the PyTorch function that is being wrapped.
My understanding from https://jax.readthedocs.io/en/latest/notebooks/external_callbacks.html is that pur…
-
This question is related to https://github.com/apache/tvm/pull/15487
I tried to embed AMD's **attention** operator directly in the llama model. The model could be compiled normally, but I encounter…
-
## 🐛 Bug
I am trying to compile this [model](https://huggingface.co/pankajmathur/orca_mini_3b) with mlc-llm. I seem to be getting the following error:
```
(myenv) aadarsh@AAD-HPLAP:~/src/mlc-ll…
-
## 🐛 Bug
Compiling models on macOS (I've tried this on x86 and Apple Silicon Macs) fails with syntax errors in the generated metal source.
## To Reproduce
Steps to reproduce the behavior:
…
-
## 🐛 Bug
Using path "dist/models/chatglm2-6b" for model "chatglm2-6b"
Target configured: cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_blo…
-
## 🐛 Bug
Running `mlc_chat_cli --local-id vicuna-13b-1.1-q3f16_0` fails with
```
Use MLC config: "/Users/peter/_Git/_GPT/mlc-llm/dist/vicuna-13b-1.1-q3f16_0/params/mlc-chat-config.json"
Use mo…
-
### Your current environment
```text
root@0fca177ad2d4:/workspace# python3 collect_env.py
Collecting environment information...
PyTorch version: 2.1.2
Is debug build: False
CUDA used to build…