-
### What happened?
Hi, im trying to use Google [Madlad400 in GGUF version,](https://huggingface.co/NikolayKozloff/madlad400-10b-mt-Q8_0-GGUF) but I'm unable to work it with `llama-server` but it work…
-
# 🌟 FAVOR+ / Performer attention addition
Are there any plans to add this new attention approximation block to Transformers library?
## Model description
The new attention mechanism with linear…
-
## Description
When I'm comparing Multihead Attention between Torch2.2 and TensorRT 9.2 on A100-SXM4-40G, I found that for certain size the result engine does not use `_gemm_mha_v2` tactics. When n…
-
Try to run a training session but met with below error inside the training_losses function
Exception has occurred: RuntimeError
Given groups=1, weight of size [1152, 12, 2, 2], expected input[8, 1…
-
Hi guys,
I am following the Megatron-LM example to pre-train a BERT model but I'm getting this error:
```
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/Megatron-LM/pretrai…
-
### What happened?
I am trying to run Qwen2-57B-A14B-instruct, and I used llama-gguf-split to merge the gguf files from [Qwen/Qwen2-57B-A14B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2-57B-A14B-…
-
config file:
```
base:
seed: &seed 42
model:
type: Mixtral
path: /models/Mixtral-8x7B-Instruct-v0.1
torch_dtype: auto
calib:
name: pileval
download: False
path: …
-
Hi,
While I am trying the training code with m4c_captioner model, I am getting the following error,
/home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: U…
-
使用命令:
`
swift eval
--eval_dataset POPE
--ckpt_dir outputs/llava1_5-7b-instruct/v0-20240909-235840/checkpoint-250
--merge_lora true
--eval_output_dir eval_outputs/lora
`
日志信息:
2024-09-…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…