-
Say, I have a tensor `z` with a size of `[1]`, and a tensor `x` with a size of `[batch_size, intermediate_dim, model_dim]`.
When calculating `z*x`, `z` should be broadcasted to the same size as x.…
-
Hi developer, when I tried to use 'gather()' method from the 'Distributedparameter', I received the following error:
````
, line 43, in __init__
self.rel_embed.weight /= torch.norm(self.rel_emb…
-
When I tried to use bmt.init_distributed(seed=0), I met the following problem.
````
Traceback (most recent call last):
File "train_inner.py", line 19, in main
bmt.init_distributed(seed=0)
…