-
I've identified a problem in the `EncoderDecoderConfig` class within the `architecture` module of the `torchscale` package.
The `EncoderDecoderConfig` class currently does not contain the `normali…
-
![image](https://github.com/microsoft/unilm/assets/70521515/45853cbb-d2eb-4f0c-b541-900e93408680)
![image](https://github.com/microsoft/unilm/assets/70521515/f8db8f08-e776-4b70-aabd-e30beede72bd)
…
-
Hey, I noticed compared to the old implementation at https://github.com/sunyt32/torchscale, xPos is no longer used for cross-attention between decoder inputs and encoder outputs. In the old implementa…
-
### 🐛 Describe the bug
Hi, we are training in webdataset format with torchdata. Everything works fine on a single-node machine. We then move to a multi-node cluster and would have the following err…
-
**Describe**
BEIT3:
Hi, I notice that you use decoupled Multiway Transformer as the backbone architecture. However, in your paper (Arxiv version), there are three experts (V-FFN, L-FFN, and VL-FFN).…
-
Hello!
I was reading up on [LongNet](https://arxiv.org/abs/2307.02486) when I wanted to have a glance at the code. It directed me to this repository, which does not seen to have any reference of `L…
-
**Describe**
Beit3:
Hi!
I found that there is only one output_projection (nn.Linear(768, 64000)) for masked language modeling. However, as Beit-3 is a multimodal model, should there also be a o…
-
Does the post layernorm and scaling in residual branch and initialization in DeepNet also support vision tasks, like ImageNet classification and mask image modeling?
-
Dear torchscale developers & researchers,
Thank you for sharing the implementation of torchcale public.
I have a question regarding the multiway_network usage in torchscale. In BeitV3.py [line 3…
-
If you `pip install torchscale` then via requirements it also installs `apex` from pypi. However, `apex` on pypi is not Nvidia's apex, but is an unrelated project with many deps. As a result, many oth…