-
Hi,
I found that the model used in BIET-3 based on torchscale is not as what the paper described.
In the multiway transformer, the self-attention layer should be shared across different modality. …
-
![image](https://github.com/syncdoth/RetNet/assets/902005/8eef7829-88ae-49e1-a65f-cd882268e688)
Trying to compare with other transformer architectures. But as soon as the training starts, the gradi…
-
In retnet-3b/config.json, according to the experimental settings of the paper
https://arxiv.org/pdf/2307.08621.pdf , set decoder_ffn_embed_dim and decoder_value_embed_dim to twice the size of decode…
-
When installing xformers according to official instruction, it fails.
Low version of torch + high version of xformers is difficult to install.
Can anyone offer a docker image?
-
Hello,
Thank you for this great project.
I have been trying to run the code, I managed to create embeddings from my tiles, but keep getting errors while running the whole slide model. At first…
-
Thank you for sharing the source code of VLMO recently.
We took a stab and pretrained a large (1024 hidden dim) multiway transformer with mim loss, mlm loss, and contrastive loss.
BEIT3 pret…
-
Great work on this study and it may provide a lot of example for the later on researches. My question is that how do you pretrain the slide encoder (by the longnet), from the repository it seems there…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
### 🐛 Describe the bug
Using nested tensors generated with `torch.narrow` as inputs to `torch.nn.functional.scaled_dot_product_attention` works fine in the forward pass of the model. However, both …
-
As opposed to the other architectures in this package, RetNet doesn't have support for padding as far as I'm aware. I was thinking the best place to introduce it was along with the positional mask. He…