-
Currently, multi-LoRA supports only Llama and Mistral architectures. We should extend this functionality to all architectures.
Yi, Qwen, Phi and Mixtral architectures seem to be the most demanded r…
Yard1 updated
3 months ago
-
Hi all - I'm working on an issue where users on M1 apple silicon get `ggml_new_tensor_impl: not enough space in the context's memory pool ` when they try to use starcoder or gptneox models from turbop…
-
```python
'''
Goose AI
pip install openai
Uses GPT-NeoX 20B to generate text.
input_query - A string, the input query (e.g. "what is a dog?")
output - A string, the generated text
ope…
-
Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be…
-
# 개요
- BitsAndBytes를 이용하여 GPT2/GPTJ 모델 8bit quantization 적용 코드 구현
- https://huggingface.co/hivemind/gpt-j-6B-8bit 를 참고
- GPTJ 모델은 kogpt 6B 모델로 변경
- GPT2 모델의 경우 attention과 fc layer가 nn.Linear가 아닌 별…
-
If you are submitting a bug report, please fill in the following details and use the tag [bug].
**Describe the bug**
Gemma-2-{size} is not loadable using from_pretrained. I checked OFFICIAL_MODEL_…
-
I'm getting a `RuntimeError: CUDA error: an illegal memory access was encountered`
using FlashAttention with a GPT-NeoX-esque model. I
```
from transformers import AutoConfig
import torch
from…
-
### Describe the bug
MODEL_ID="/models/models--EleutherAI--gpt-neox-20b"
mkdir saved_results_gpt_neox
python run_gpt-neox_int8.py --ipex-weight-only-quantization --output-dir "saved_results_gpt_neo…
-
1) Convert Robin Model to HF checkpoint. For this you need to extend the GPT-NeoX class in HF and add CLIP encoder, and adapter to it and adapt the conversion script by including clip and adapter weig…
-
How would I convert this into the ggml format?
https://huggingface.co/andreaskoepf/pythia-2.8b-gpt4all-pretrain/tree/main