-
## 🐛 Bug
KVCache takes up too much memory when running mlc_llm.serve.server, but memory usage is normal when using cli or mlc_llm.gradio
## To Reproduce
Steps to reproduce the behavior:
- Do…
-
### What is your issue?
This is my crash report and I think it's similar to #8410 :
```
Fatal Python error: Segmentation fault
Thread 0x00007f4ff8daf700 (most recent call first):
File "/home…
-
# Failing Tests
> Please see the failing tests divided into sections below. Click on each section to expand. Feel free to get assigned to an issue by following the instructions [here](https://unify.ai…
-
## 🐛 Bug
mlc-llm has a problem with generating text that are completely unrelated to the prompts on some models, I think this mainly affects the new models that are available with the last [tvm bug…
-
### Describe the bug, including details regarding any error messages, version, and platform.
I'm running within the VS2022 developer prompt, and encountered the following error:
```
❯ pip install…
-
There is a discussion in the array api about adding a [nocopy *request* to the Python API](https://github.com/data-apis/array-api/issues/626). While it might be nice to solve such requests at a lower…
-
# Failing Tests
> Please see the failing tests divided into sections below. Click on each section to expand. Feel free to get assigned to an issue by following the instructions [here](https://unify.ai…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
### Describe the bug
When starting the api_…
-
## 🐛 Bug
## To Reproduce
Steps to reproduce the behavior:
1. build libmlc_llm.so and libtvm_runtime.so
```
# Android config
ANDROID_NDK=$ANDROID_NDK
ANDROID_ABI=arm64-v8a
ANDROID_PLA…
-
## 🐛 Bug
Hello, I am trying to convert my own model type based on llama2 to compille to work on wasm,
but I cannot make it to work whatever I do.
I followed the instructions here (https://llm.m…