linear-transformer Search Results

1000+ results
for linear-transformer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kyegomez/LongNet #23

Train Error

(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py 2024-03-05 23:56:10,524 - numexpr.utils - INFO - NumExpr defaulting to 8 threads. 2024-03-05 23:56:17.908409: I tensorflow/core/platform/…

bruicecode updated 2 months ago
3
sovrasov/flops-counter.pytorch #101

There was a bug with computing MultiheadAttention flops

I found that the hook function will not be called when calculating MultiheadAttention module with requires_grad=False, this causes the FLOPs to be 0. No errors with requires_grad=True.

ssk1997 updated 1 year ago
9
minimaxir/gpt-2-simple #188

How to maximize performance for single generation?

I am running the 124M model on a V100 GPU and it takes about 6 seconds to execute gpt2.generate(..., length=50, ...) to return a single predictions. If I set nsamples=100, batch_size=100, it returns a…

mbellmbell updated 4 years ago
5
OpenAccess-AI-Collective/axolotl #1025

TypeError: _forward_cross_attn() got an unexpected keyword a…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. …

varunmayya updated 3 months ago
1
CrazyBoyM/llama3-Chinese-chat #41

DPO训练问题

dpo训练小白想请教下大家，我用llama-3-8b-instruct尝试进行dpo训练，数据是从hf上找的中文和英文的dpo数据，训练了4个epoch之后loss已经降到0.1左右，进行测试，模型效果不仅没有提升还出现各种各样的问题，甚至问dpo训练集里的都会出现重复瞎答的现象下面是我训练的代码，不知道是不是哪里出现bug import torch from transfor…

chanel111 updated 2 weeks ago
6
pytorch/pytorch #32590

[FYI] MultiheadAttention / Transformer

### This issue is created to track the progress to refine `nn.MultiheadAttention` and `nn.Transformer`. Since the release of both modules in PyTorch v1.2.0, we have received a lot of feedback from…

zhangguanheng66 updated 2 years ago
17
lm-sys/FastChat #3055

Using train_with_template on mistral end up in a model with …

I use `train_with_template.py` with `mistralai/Mistral-7B-Instruct-v0.2` ``` torchrun --nproc_per_node=2 --master_port=20001 fastchat/train/train_with_template.py \ --model_name_or_path mistr…

christobill updated 4 months ago
3
UKPLab/sentence-transformers #1182

Sentence embeddings for author/news source attribution

We have been experimenting with different setups for the task of news source verification. Our first approach trained on sentence pairs from same and different source domains with cosine loss. For ver…

ericlief updated 2 years ago
2
PaddlePaddle/PaddleSeg #3569

heck the Attr(axis) of Op(elementwise_add) in pass(conv_elem…

### 问题确认 Search before asking - [X] 我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleSeg/issues)(包括open与closed)，没有发现相似的bug。I have searched the [open and closed issues](https://github.com/PaddlePa…

dugushiyu updated 7 months ago
1
VainF/Torch-Pruning #393

Apply pruning to DINO meet problems: "index 384 is out of bo…

Hello! Thank you for your outstanding work! However, we encountered an issue when attempting to apply the pruning method you proposed to DINO: IndexError: index 384 is out of bounds for dimension 0 …

Ayews updated 3 days ago
5

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for linear-transformer

1000+ results
for linear-transformer