-
Hi there,
Great work with dMoE! I'm trying to test dMoE with regular DDP + pytorch AMP(BF16) and I get the following error:
```bash
optimizer_state["found_inf_per_device"] = self._unscale_…
-
https://github.com/NVIDIA/TransformerEngine/blob/e3bb24e5a347c58353e62307bc84cca856f9e9be/transformer_engine/pytorch/module/linear.py#L405-L407
if the weight.requires_grad set to False, when to cal…
-
http://gradsusr.org/pipermail/gradsusr/2008-July/007358.html
https://github.com/j-m-adams/GrADS/blob/master/src/gribmap.c
EDIT: I think the formatting comes from wgrib2:
Vertical Levels
…
-
if trainer is None:
sample_grads = grads
params[0].assign_sub(grads[0] * lr)
params[1].assign_sub(grads[1] * lr)
为什么params只更新0,1,不应该是
…
-
I am using Keras with tensorflow backend and I have fine-tuned the last Conv layer and FC layer of my network based on VGG weights. Now I am using grad-CAM technique to visualize which parts of my ima…
-
https://github.com/openai/baselines/blob/b99a73afe37206775ac8b884d32a36e213a3fac2/baselines/deepq/deepq_learner.py#L174-L181
In line 179, shouldn't it be:
`grads = clipped_grads`
instead of
`cli…
-
Grads data is converted one by one currently. We should use multi-thread to convert several messages at the same time.
The order is an important thing in NWPC's GRIB 2 files. In the serial version…
-
I found here cause nan:
ldm/modules/losses/contperceptual.py
```
def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):
if last_layer is not None:
nll_gra…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
tf 2.16.1
### Custom code
Yes
### OS platform and distribution
Ubunt…
-
I think one of the main motives for LoRA is to reduce memory consumption—certainly that's my motive. I'm already using gradient checkpointing and AdaFactor so the main thing I want from LoRA is to red…