-
- Version: python==3.7.9, torch==1.9.0+cu111, torchvision==0.10.0+cu111, calflops==0.2.0
- Problem:
Here's an example to see the error:
```python
import torch
import torch.nn as nn
from calflops…
-
Models envisioned:
- TCN / LSTM / GRU as sequential models;
- Decided to abandon HLSTM, indie LSTM, transformer for various reasons;
- Attention schemes:
- Average pooling
- Plain self-attent…
-
Hi, thanks for the nice work and great repo!
I changed config to train_with_clip=1 to include ClipLoss.
Then, I am getting the following error in the eval step:
![image](https://user-images.githu…
-
Hi,
I already asked this question on stackoverflow, but didn't get any responses. So I'll try here:
I am trying to develop a transformer sequence to vector model but encounter performance issues. I …
-
# Attention 기본 개념 정리 - 지오의 논문 탐방
Table of Contents Attention이란? Self-Attention Multi-Head Attention Transforemrs a. Encoder b. Decoder
[https://jio0728.github.io/deep%20learning/Attenti…
-
### 🐛 Describe the bug
# reproduce the bug
@mstebelev found out that memory efficient attention kernel on float32 cuda tensors gives nan gradients despite inputs and incoming gradient are reaso…
-
I want to find the code of "masked" multi-head attention
for "masking out (setting to −∞)" all values in the input of the softmax which correspond to illegal connections in decode part.
But, I ca…
-
### 🚀 The feature, motivation and pitch
I'm working on applications that must run locally in resource-limited HW. Threrefore, quantization becomes essential. Such applications need from multimodal vi…
-
in the file labml_nn/transformer/mha.py
```python
def forward(self, x: torch.Tensor):
# Input has shape `[seq_len, batch_size, d_model]` or `[batch_size, d_model]`.
# We …
-
训练gpt时无论显存多少都会爆,不管batch_size设置为多少都是。
可以开始训练,中途时会报错。
音频未切分,最长38s。
```
raceback (most recent call last):
File "C:\GPT-SoVITS-v2-240821\GPT_SoVITS\s1_train.py", line 183, in
main(args)
Fi…