-
### 🐛 Describe the bug
AMP consumes about 30x gpu memory when used on `bmm`.
Code:
```
import torch
import torch.nn as nn
class MyModule(nn.Module):
def __init__(self):
super…
Teoge updated
2 years ago
-
Why code below, in the project, can be used as "loss".
` oloss = t.bmm(ovectors, ivectors).squeeze().sigmoid().log().mean(1)
nloss = t.bmm(nvectors, ivectors).squeeze().sigmoid().log(…
-
when i tried to run the example from the youtube video "https://www.youtube.com/watch?v=-Grfxkg3-DI" I got this error when I try to run the training cell
Epoch 1:
----------------------------------…
-
在下面的代码中, 我觉得应该表明为什么 Q, K, V 向量序列是等于 inputs_embeds 的, 我理解的是注意力机制中的 QKV 是 embedding 与 W_Q 和 W_K , W_V 这三个矩阵相乘得到的, 这三个矩阵也是超参数, 而下面的代码是好像默认 这三个矩阵是单位矩阵.
`import torch
from math import sqrt
Q = K = V…
-
# Top level issue for LLM matmul optimizations
### Llama2
#### Repro
branch: cglagovich/6689
```
./tt_metal/tools/profiler/profile_this.py -c "pytest -svv models/demos/llama2_70b/tests/pe…
-
Hello,
I am currently utilizing Wazuh in a Kubernetes environment and am seeking guidance on how to effectively integrate LDAP with the Kubernetes manifest.
Could you provide any guidance or ref…
-
Hello, I am very interested in your paper. However, I found that the loss of l_elas_orth is always 0 when I train on smal. Is this normal? And after reviewing, I found that the result of testing on sm…
-
This issue can lead to mistakes. So let's train it properly. To avoid such confusion in the future, we need to remove cosAlphaXY branch completely.
-
Debugging your code, i have not found the code about your clone and split algorithm, where is it??
In /scene/gaussian_model.py, line 492 is the function
def densify_and_split(self, grads, grad_thres…
-
I noticed that "Tutel v0.3: Add Megablocks solution to improve decoder inference on single-GPU with num_local_expert >= 2", but when I use megablocks in MoE training (dropless-MoE), the following err…