linear-attention-model Search Results

1000+ results
for linear-attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ggerganov/llama.cpp #9030

Bug: NikolayKozloff/madlad400-10b-mt-Q8_0-GGUF works with ll…

### What happened? Hi, im trying to use Google [Madlad400 in GGUF version,](https://huggingface.co/NikolayKozloff/madlad400-10b-mt-Q8_0-GGUF) but I'm unable to work it with `llama-server` but it work…

PayteR updated 2 weeks ago
6
huggingface/transformers #7675

Add FAVOR+ / Performer attention

# 🌟 FAVOR+ / Performer attention addition Are there any plans to add this new attention approximation block to Transformers library? ## Model description The new attention mechanism with linear…

marrrcin updated 2 years ago
46
NVIDIA/TensorRT #3665

Unfused Multihead attention TensorRT 9.2 is 2x slower than P…

## Description When I'm comparing Multihead Attention between Torch2.2 and TensorRT 9.2 on A100-SXM4-40G, I found that for certain size the result engine does not use `_gemm_mha_v2` tactics. When n…

haijieg updated 7 months ago
3
aigc-apps/EasyAnimate #82

Given groups=1, weight of size [1152, 12, 2, 2], expected in…

Try to run a training session but met with below error inside the training_losses function Exception has occurred: RuntimeError Given groups=1, weight of size [1152, 12, 2, 2], expected input[8, 1…

chintha updated 2 months ago
4
NVIDIA/Megatron-LM #952

[BUG] Error pre-training BERT

Hi guys, I am following the Megatron-LM example to pre-train a BERT model but I'm getting this error: ``` [rank0]: Traceback (most recent call last): [rank0]: File "/root/Megatron-LM/pretrai…

fabiancpl updated 1 week ago
2
ggerganov/llama.cpp #9628

Bug: Failed to run qwen2-57b-a14b-instruct-fp16.

### What happened? I am trying to run Qwen2-57B-A14B-instruct, and I used llama-gguf-split to merge the gguf files from [Qwen/Qwen2-57B-A14B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2-57B-A14B-…

tang-t21 updated 2 weeks ago
3
ModelTC/llmc #22

Mixtral 8x7b failed on compile with tensorrt-llm

config file: ``` base: seed: &seed 42 model: type: Mixtral path: /models/Mixtral-8x7B-Instruct-v0.1 torch_dtype: auto calib: name: pileval download: False path: …

gloritygithub11 updated 1 month ago
2
facebookresearch/mmf #1262

Exception: process 0 terminated with signal SIGKILL

Hi, While I am trying the training code with m4c_captioner model, I am getting the following error, /home/root1/anaconda3/envs/mmf/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: U…

Huangzhw0221 updated 1 year ago
1
modelscope/ms-swift #2000

无法评测LoRA微调后的llava1.5模型

使用命令： ` swift eval --eval_dataset POPE --ckpt_dir outputs/llava1_5-7b-instruct/v0-20240909-235840/checkpoint-250 --merge_lora true --eval_output_dir eval_outputs/lora ` 日志信息： 2024-09-…

Harry-zzh updated 1 month ago
1
axolotl-ai-cloud/axolotl #1775

Training early stop without setting the early_stopping_patie…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

leoozy updated 2 months ago
1

上一页 1...22 23 24 25 26 27 28...100 下一页

1000+ results for linear-attention-model

1000+ results
for linear-attention-model