-
### Describe the bug
I was unsuccessful in loading the model with the following parameters, and the latest version of xinference is 0.8.1
I started qwen-chat on the ui pagemodel format: gptq
mode…
auxpd updated
7 hours ago
-
# Description
Current challenges in using Neural Operators are: irregular meshes, multiple inputs, multiple inputs on different meshes, or multi-scale problems. [1] The Attention mechanism is promi…
-
By saving the model and reloading it I managed to get the model working, both with quantized and full precision (it still uses 10gb max of gpu ram).
However, the model generates random characters. He…
-
**What API design would you like to have changed or added to the library? Why?**
Most people expect `diffusers` and `transformers` "models" to be "unloaded" so that they can "just" "run" a "big" "p…
-
### Feature request
Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…
-
First, congrats for the repo - looks great
I discovered that switching between `torch.no_grad` and `torch.inference_mode` leads to a switch to `aten.linear.default`. Feel free to use this feedback …
-
we are trying to finetune chatGLM6B using LoRA on arcA770 1card and 2cards , use the following command
1card:
```
python ./alpaca_lora_finetuning.py \
--base_model "/home/intel/models/chat…
-
Hello,
I have an enormous amount of `nan` and `inf` in outputs of quantized models for sequence classification. It is not the case with non-quantized models, which never outputs nans whatever the s…
-
Description
- This project is intended to explore couple papers in literature of Quantum Transformer models [self attention model: https://arxiv.org/abs/2205.05625 , Quantum vision transformers : htt…
-
### ⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous [Ideas in Discussions](https://github.com/OpenAccess-AI-Collective/axolotl/discussions/categories…