multi-head-self-attention Search Results

1000+ results
for multi-head-self-attention

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

MrYxJ/calculate-flops.pytorch #40

AttributeError: 'list' object has no attribute 'shape' in fu…

- Version: python==3.7.9, torch==1.9.0+cu111, torchvision==0.10.0+cu111, calflops==0.2.0 - Problem: Here's an example to see the error: ```python import torch import torch.nn as nn from calflops…

Necolizer updated 3 weeks ago
1
snakers4/emoji_sentiment #4

Models for experiments

Models envisioned: - TCN / LSTM / GRU as sequential models; - Decided to abandon HLSTM, indie LSTM, transformer for various reasons; - Attention schemes: - Average pooling - Plain self-attent…

snakers4 updated 5 years ago
1
yael-vinker/CLIPasso #21

ClipLoss - RuntimeError: cannot register a hook on a tensor …

Hi, thanks for the nice work and great repo! I changed config to train_with_clip=1 to include ClipLoss. Then, I am getting the following error in the eval step: ![image](https://user-images.githu…

Miriam2040 updated 2 months ago
3
keras-team/keras #20369

Tensorflow executes slow on GPU - retracing issue?

Hi, I already asked this question on stackoverflow, but didn't get any responses. So I'll try here: I am trying to develop a transformer sequence to vector model but encounter performance issues. I …

david-eidmann updated 3 weeks ago
6
Jio0728/Jio0728.github.io #3

deep%20learning/Attention/

# Attention 기본 개념 정리 - 지오의 논문 탐방 Table of Contents Attention이란? Self-Attention Multi-Head Attention Transforemrs a. Encoder b. Decoder [https://jio0728.github.io/deep%20learning/Attenti…

utterances-bot updated 2 years ago
1
pytorch/pytorch #125674

[BUG]Nan in gradients of scaled_dot_product_attention operat…

### 🐛 Describe the bug # reproduce the bug @mstebelev found out that memory efficient attention kernel on float32 cuda tensors gives nan gradients despite inputs and incoming gradient are reaso…

walkacross updated 2 weeks ago
4
tensorflow/tensor2tensor #941

Masked multi-head attention in Transformer code

I want to find the code of "masked" multi-head attention for "masking out (setting to −∞)" all values in the input of the softmax which correspond to illegal connections in decode part. But, I ca…

khyun0630 updated 5 years ago
2
vllm-project/vllm #9324

[Feature]: Quantization support for LLaVA OneVision

### 🚀 The feature, motivation and pitch I'm working on applications that must run locally in resource-limited HW. Threrefore, quantization becomes essential. Such applications need from multimodal vi…

salvaba94 updated 3 weeks ago
2
labmlai/annotated_deep_learning_paper_implementations #216

Is there some errors in transformers mha.py

in the file labml_nn/transformer/mha.py ```python def forward(self, x: torch.Tensor): # Input has shape `[seq_len, batch_size, d_model]` or `[batch_size, d_model]`. # We …

lqzzy updated 1 year ago
1
RVC-Boss/GPT-SoVITS #1630

训练gpt时爆显存，batch_size设置无效

训练gpt时无论显存多少都会爆，不管batch_size设置为多少都是。可以开始训练，中途时会报错。音频未切分，最长38s。 ``` raceback (most recent call last): File "C:\GPT-SoVITS-v2-240821\GPT_SoVITS\s1_train.py", line 183, in main(args) Fi…

Saiditall updated 1 month ago
7

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for multi-head-self-attention

1000+ results
for multi-head-self-attention