-
Dear torchtitan team, I have a question regarding gradient norm clipping when using pipeline parallelism (PP) potentially combined with `FSDP/DP/TP`.
For simplicity, let's assume each process/GPU h…
-
Question for you guys: as best I can tell, there is no support at present for keeping activations in fp8 between the "output" matmul (of either an attention block or MLP block) and the next norm (laye…
-
### Voice Changer Version
vcc client
### Operational System
win 11
### GPU
3050
### CUDA Version
new
### Read carefully and check the options
- [X] If you use win_cuda_torch_cuda edition, set…
-
s
![微信截图_20240923214125](https://github.com/user-attachments/assets/d0ba934e-e018-49cf-ac2d-92b146506b29)
-
[Huggingface RMSNorm](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L111) and Torch RMSNorm give slightly different values (=0.0029 on one input…
-
Dear author
I'm sorry to bother you, but I have a problem that is very confusing to me. When I used my metric to draw my black hole shadow, the black hole shadow is based on the thin disk model, I fo…
-
**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Y
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- …
-
```
norm=torch.linalg.norm(y-den_rec,dim=dim,` ord=2)
rec_grads=torch.autograd.grad(outputs=norm, inputs=x)
rec_grads=rec_grads[0]
normguide=torch.linalg.norm(rec_grads)/x.shape[-1]**0.5
#n…
-
Hi, thanks for your brilliant work, release of the paper, weights(as far as i understood, there's more to be released!), and code.
I'm very thrilled by your achievements in omni-modal field, it reall…
-
In the module: `MambaTransformer/mamba_transformer`, you execute the following in `class MambaTransformerblock`:
```python
# Layernorm
self.norm = nn.LayerNorm(dim)
def forwa…