-
As can be seen from https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/121 we have a divergence between Meg and HF GPT2, while using the same weights under fp16.
So the proposed soluti…
-
### Context: scikit-learn's usage and specificities
While the current `Nodes` and `DataContainers` this is sufficient for most library like SciPy and NumPy which can entirely be used with free func…
-
Hi, thanks for sharing your code here.
I have attempted to run below code from your [answer](https://github.com/andreasbinder/Point-GNN-PyTorch/issues/1#issuecomment-1482478162) in issue #1 :
```[…
-
1. In the above image for MLP regression head, it has been shown there are two linear layers inside the regression head. Is this predefined? or can we modify the number of linear layers? if the…
-
Hello together,
I am trying to train a PET model with the current version of metatrain.
I have set up a new environment and followed the installing instructions.
However I do not find an input f…
-
https://arxiv.org/abs/1605.03004
-
As of 15 october (commit 06dc6cdd2fd87df0c4462603daa6bb6d1c43e7b3 and c308c30f1e8d81153547378012e61ab86d7e2ef4), the model architecture for the lightweight models is incompatible with the one present …
-
I just swap out Nero optimizer in my Lightning AI loop and gave the new Shampoo a try. There is something going on with it, as this card is typically able to do 2 it per second on almost anything. Old…
-
### 🐛 Describe the bug
torch.compile() breaks when using DeepSpeed ZeRO Level 3 sharding. I am fine-tuning Llama 2 using the Transformers codebase, and added a `torch.compile()` decorator over the ML…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…