-
***
☝️ **Important announcement:** Greenkeeper will be saying goodbye 👋 and passing the torch to Snyk on June 3rd, 2020! [Find out how to migrate to Snyk and more at greenkeeper.io](https://greenkeep…
-
***
☝️ **Important announcement:** Greenkeeper will be saying goodbye 👋 and passing the torch to Snyk on June 3rd, 2020! [Find out how to migrate to Snyk and more at greenkeeper.io](https://greenkeep…
-
***
☝️ **Important announcement:** Greenkeeper will be saying goodbye 👋 and passing the torch to Snyk on June 3rd, 2020! [Find out how to migrate to Snyk and more at greenkeeper.io](https://greenkeep…
-
Dynamic quantization of PyTorch models has proven to be a challenge for two reasons.
(1) Dynamic quantization ties the traced TorchScript model to a particular architecture and makes it non-portabl…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_se…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_se…
-
### Bug description
When using DeepSpeed, the changes of checkpoint (add/remove key) in `on_save_checkpoint` are not being preserved. Switching strategy to `ddp`, the changes are saved as expected.
…
-
Hi,
I have been trying to look into adding a decomposition for the `aten::_native_multi_head_attention` op. The issue I have run into is that for certain inputs (specifically `need_weights=False`) …
-
Hi there
Your idea of using the transformer for event-based depth estimation is great.
I'm working on Python 3.9 and CUDA 11.8 and trying to update the environment.
The first is the summery fu…
-
**Describe the bug**
When comparing zero-1 and zero-2, I noticed discrepancies between the results in the DeepSpeed Flops Profiler and the training speed metrics in transformers, and the conclusions …