-
Hi,
thanks for sharing the code. I have tryed to use your repo using `bitsandbytes` for model quantization. Unfortunately, the training process does not work: the layers defined in `modelling_llama.p…
-
Hi, thank you for providing 1.58bit implementation. Nice work! I looked through many bitnet1.58 implementations and noticed that they all use the method suggested in "The Era from 1-bit LLMs: Training…
-
mlc-ai-nightly-cu122 0.15.dev404
mlc-llm-nightly-cu122 0.1.dev1355
transformers 4.41.2
git clone https://huggingface.co/THUDM/glm-4-9b-chat
mlc_llm convert_we…
-
## Introduction
This document outlines a high level proposal for providing efficient, yet easy to use k-NN in OpenSearch in low-memory environments. Many more details to come in individual compone…
-
Please make sure that this is a feature request. As per our [GitHub Policy](https://github.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature …
-
I trained taming-transformers on my own data set and got the ckpt file and the corresponding yaml file. When I apply it to vq-diffusion, an error will be reported. I followed `configs/imagenet.yaml`. …
-
## Where are we?
Exporting pytorch model for ExecuTorch runtime goes through multiple AoT (Ahead of Time) stages.
At high level there are 3 stages.
1. `exir.capture`: This captures model’s graph …
-
### 🐛 Describe the bug
Hello,
I am running llama3-70b and mixtral with VLLM on a bunch of different kinds of machines. I encountered wildly different quality performance on A10 GPUs vs A100/H…
-
When we have trained the quantization model, how to deploy it?
-
(chatglm) n:\github\GLM-4>python openai_api_lby.py
2024-06-12 15:24:16,061 - Start initialize model...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are …