torchscale Search Results

105 results
for torchscale

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/torchscale #32

EncoderDecoder Configuration Issue

I've identified a problem in the `EncoderDecoderConfig` class within the `architecture` module of the `torchscale` package. The `EncoderDecoderConfig` class currently does not contain the `normali…

klae01 updated 1 year ago
1
microsoft/unilm #1213

retnet: pseudocode in the paper is inconsistent with Equatio…

![image](https://github.com/microsoft/unilm/assets/70521515/45853cbb-d2eb-4f0c-b541-900e93408680) ![image](https://github.com/microsoft/unilm/assets/70521515/f8db8f08-e776-4b70-aabd-e30beede72bd) …

fanfanfan-hff updated 1 year ago
3
microsoft/torchscale #28

xPos cross-attention change

Hey, I noticed compared to the old implementation at https://github.com/sunyt32/torchscale, xPos is no longer used for cross-attention between decoder inputs and encoder outputs. In the old implementa…

janEbert updated 1 year ago
2
pytorch/data #1142

tcp connection refused when multinode training

### 🐛 Describe the bug Hi, we are training in webdataset format with torchdata. Everything works fine on a single-node machine. We then move to a multi-node cluster and would have the following err…

rxqy updated 1 year ago
3
microsoft/unilm #1041

BEIT3: multiway transformer

**Describe** BEIT3: Hi, I notice that you use decoupled Multiway Transformer as the backbone architecture. However, in your paper (Arxiv version), there are three experts (V-FFN, L-FFN, and VL-FFN).…

violet-sto updated 1 year ago
6
microsoft/unilm #1182

LongNet code

Hello! I was reading up on [LongNet](https://arxiv.org/abs/2307.02486) when I wanted to have a glance at the code. It directed me to this repository, which does not seen to have any reference of `L…

tomaarsen updated 11 months ago
32
microsoft/unilm #1078

Question about the number of output_projection in Beit3

**Describe** Beit3: Hi! I found that there is only one output_projection (nn.Linear(768, 64000)) for masked language modeling. However, as Beit-3 is a multimodal model, should there also be a o…

violet-sto updated 1 year ago
3
microsoft/torchscale #14

Does Torchscale support vision transformers in vision tasks?

Does the post layernorm and scaling in residual branch and initialization in DeepNet also support vision tasks, like ImageNet classification and mask image modeling?

nightsnack updated 1 year ago
5
microsoft/torchscale #15

[Question] what are the usages of multiway_network.py?

Dear torchscale developers & researchers, Thank you for sharing the implementation of torchcale public. I have a question regarding the multiway_network usage in torchscale. In BeitV3.py [line 3…

yiqiwang8177 updated 1 year ago
2
microsoft/torchscale #18

Installer bug - wrong `apex` package installed

If you `pip install torchscale` then via requirements it also installs `apex` from pypi. However, `apex` on pypi is not Nvidia's apex, but is an unrelated project with many deps. As a result, many oth…

jph00 updated 1 year ago
2

上一页 1...5 6 7 8 9 10 11...11 下一页

105 results for torchscale

105 results
for torchscale