-
### Your current environment
I am currently using a T4 instance on Google Colaboratory.
```
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used…
-
Could already quanted model like: https://huggingface.co/01-ai/Yi-34B-Chat-4bits could be directly compiled in mlc_llm?
I try directly do --quant option like q0f16 or q4f16, but it report some lay…
-
`git clone --depth 1 --single-branch https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
`
```
INFO:hf-to-gguf:Loading model: Qwen1.5-4B-Chat-GPTQ-Int4
INFO:gguf.gguf_writer:gguf: This GGUF f…
-
### Your current environment
```text
The output of `python collect_env.py`
```
```
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…
-
[Whisper](https://github.com/openai/whisper) is an open-source model created by OpenAI.
The author of [ggml](https://github.com/ggerganov) provides a high-performance inference impl using ggml call…
-
Support for training a customized predictor for a specific LLM model by adding a flag that specifies the model name from the [dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
-
I tried to deploy an API serving using baichuan-7b, but there is an error:
```
NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=6,7 python -m vllm.entrypoints.openai.api_server --model /root/data/zyy/baichua…
-
Suppose there are certain topics that users are interested in but are not available in the encyclopedia. Is it possible for them to provide feedback on the web to include new issues for the dev site t…
-
/kind bug
**What steps did you take and what happened:**
I have a local cluster without internet access. Manifests version 1.8 is deployed on it. I deployed this version using images imported as t…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu …