-
Up to this point, I did not find minimal examples to resume training from universal checkpoint.
The only example for using universal checkpoint is [here](https://github.com/microsoft/Megatron-DeepS…
-
### 🐛 Describe the bug
I'm unable to get pyg-lib to import on my Mac. It seems to affect both Intel and M1 Macs (I'm on an Intel, but an M1 user reported the same issue: https://github.com/pyg-team/…
-
-
I trained model using Accelerate+Deepspeed ZeRO-2 and got a ZeRO-2 checkpoint. The checkpoint structure is listed below. And this is the Google Drive [link](https://drive.google.com/drive/folders/1e…
-
## 🚀 Feature
Incorporate support for Posit number format as another dtype of PyTorch
## Motivation
The Posit number format (introduced in [Beating Floating Point at its Own Game](http://www.johng…
-
## Context
For better interoperability between ML frameworks, it would be great if `torch` API matched more closely `numpy` API (like `tf.experimental.numpy` and `jax.numpy`).
This is a highly r…
-
**Authors:** @keijibranshi @rosbo @mrisdal @neshdev [at.bflynn](mailto:bflynn@kaggle.com)
# Summary
This RFC proposes extending torchtune to support loading pre-trained and fine-tuned model weig…
-
### Describe the bug
I ran a long-running training job for 201 epochs. The training ran for 201 epochs and then showed me the error below. I also got an email saying that my run had failed. The web U…
-
# The interoperability Standard of Third-party Backend Integartion Mechanism
**Authors:**
* @fffrog
* @hipudding
## **Summary**
As the top AI framework,PyTorch will see more and more backen…
-
In other words ... better packages than this one.