-
Sorry for bothering you and this may be a dumb question:
The Complex type in here is for what?
I'm not very good at math and if you guys can explain why we need to use complex it will be good.
-
Hello :smile:
The BeIT3 paper mentions that Vision-language experts are employed in the top three Multiway Transformer layers. However, by taking a look the MultiwayNetwork implementation, I find …
-
I would like to use this repo for my job. I cannot do so until you add a license to the repo. Can you please do so soon?
-
Hi,
Thank you for your great work!
When I use your example code to compare the Inference Latency with Transformer-based LLM, the result is not as expected in the paper (15.6X). Could you please …
-
when link https://publicmodel.blob.core.windows.net/torchscale/vocab/dict.txt
This XML file does not appear to have any style information associated with it. The document tree is shown below.
Pu…
-
.
.
Hi, I plan to reproduce the results of the WMT-17 translation task as presented in the deepnet paper. Could you please let me know what the command for running the script shou…
-
Hey kyegomez,
I'm interested in trying out the implementation.
Is it already possible to use a basemodel for this?
-
**Describe**
If I only want to use one of the models in the repo, I have to download the whole repo.
But this is not neccessary.
It is difficult to download the whole repo quickly in a short period…
-
Hi, when I use retnet's parallel mode to train, it's very slow, I observe the gou memory usage, it's very small, what's going on?
Thank you!
```[tasklist]
### Tasks
```
-
I've identified a problem in the `EncoderDecoderConfig` class within the `architecture` module of the `torchscale` package.
The `EncoderDecoderConfig` class currently does not contain the `normali…