torchscale Search Results

124 results
for torchscale

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/unilm #1109

BIET-3 self-attention is not shared across modality

Hi, I found that the model used in BIET-3 based on torchscale is not as what the paper described. In the multiway transformer, the self-attention layer should be shared across different modality. …

xinghua-qu updated 1 year ago
1
syncdoth/RetNet #6

encountered nan while trying to train

![image](https://github.com/syncdoth/RetNet/assets/902005/8eef7829-88ae-49e1-a65f-cd882268e688) Trying to compare with other transformer architectures. But as soon as the training starts, the gradi…

liujuncn updated 1 year ago
10
syncdoth/RetNet #34

The number of parameters does not match the setting in paper

In retnet-3b/config.json, according to the experimental settings of the paper https://arxiv.org/pdf/2307.08621.pdf , set decoder_ffn_embed_dim and decoder_value_embed_dim to twice the size of decode…

ziHoHe updated 10 months ago
1
microsoft/unilm #1336

[kosmos-g] Problem about docker image setup

When installing xformers according to official instruction, it fails. Low version of torch + high version of xformers is difficult to install. Can anyone offer a docker image?

caicj15 updated 4 months ago
15
prov-gigapath/prov-gigapath #48

Missing information on GPUs minimum requirements

Hello, Thank you for this great project. I have been trying to run the code, I managed to create embeddings from my tiles, but keep getting errors while running the whole slide model. At first…

maiskovich updated 4 months ago
1
microsoft/unilm #969

extending VLMO with MIM (Masked Image Modeling) loss

Thank you for sharing the source code of VLMO recently. We took a stab and pretrained a large (1024 hidden dim) multiway transformer with mim loss, mlm loss, and contrastive loss. BEIT3 pret…

jinxixiang updated 1 year ago
10
prov-gigapath/prov-gigapath #77

pretrain slide encoder

Great work on this study and it may provide a lot of example for the later on researches. My question is that how do you pretrain the slide encoder (by the longnet), from the repository it seems there…

linuxbioinfoguy updated 2 months ago
7
OpenBMB/MiniCPM-V #486

[BUG] Data fetch error - typo

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing ans…

anthisyme updated 6 days ago
7
pytorch/pytorch #136270

Training (backward) crashes when using `torch.narrow`, neste…

### 🐛 Describe the bug Using nested tensors generated with `torch.narrow` as inputs to `torch.nn.functional.scaled_dot_product_attention` works fine in the forward pass of the model. However, both …

davidbuterez updated 3 weeks ago
3
microsoft/torchscale #85

Introducing padding_mask to RetNet

As opposed to the other architectures in this package, RetNet doesn't have support for padding as far as I'm aware. I was thinking the best place to introduce it was along with the positional mask. He…

xtwigs updated 11 months ago
2

上一页 1...1 2 3 4 5 6 7...13 下一页

124 results for torchscale

124 results
for torchscale