-
### 🐛 Describe the bug
# reproduce the bug
@mstebelev found out that memory efficient attention kernel on float32 cuda tensors gives nan gradients despite inputs and incoming gradient are reaso…
-
Hi,
I was wondering, why use WordConv (separable convolution) in NL encoder and not the usual Feedforward NN (like original transformer)? Is it mainly because separable conv is easier to train? Did…
-
### Summary
We mention the 'feedback process' in the [Building Healthy Leadership Skills](https://the-turing-way.netlify.app/collaboration/leadership/leadership-building.html) section, but we don't…
-
Some functions do almost the same thing across these two classes of models but have slightly different syntax and/or arguments, return types. Maybe the following functions could be made abstract and p…
-
## error log | 日志或报错信息 | ログ
(bk-sdm) :~/pnnx/build/install/bin$ ./pnnx ~/diffusers-ncnn/model/unet-fp16.pt inputshape=[1,4,32,32],[1],[1,77,768]
pnnxparam = ~/diffusers_ncnn/model/unet_fp16.pnnx.par…
-
# what makes code DRY ?
"Don't repeat yourself" (DRY) is a principle of software development aimed at reducing repetition of information which is likely to change, replacing it with abstractions th…
-
Hi,
I have trained zipformer2 (without streaming) model with my dataset.
Training command: **./zipformer/train.py --num-epochs 40 --start-epoch 1 --use-fp16 1 --enable-musan False --exp-dir zip…
-
This code is working:
```
import torch
import pdb
from xlstm import (
xLSTMBlockStack,
xLSTMBlockStackConfig,
mLSTMBlockConfig,
mLSTMLayerConfig,
sLSTMBlockConfig,
…
-
Hey, this isn't a pressing issue for me as i'm happy to use the other optimisers which are working fine. With some settings i occasionally get some errors from what i guess is the Tensor extension. Be…
-
In the paper, you say "Since the original BLIP-2 models do not include checkpoints for Vicuna, we perform pre-training with Vicuna
using the same procedure as BLIP-2". Is this means instructblip trai…