-
### System Info
- `transformers` version: 4.43.2
- Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.24.2
- Safetensors version: 0.4…
-
### Feature request
Besides loss, users often need to report additional metrics throughout the training in order to drive decision making and communicate results, which in the case of Seq2Seq models …
-
### 🚀 Feature
Hi, can you please consider adding Apple Silicon support?
mlx-lm makes it very easy to fine-tune LLM. That'll be great having UI for this framework from H2O LLM Studio.
`pip install …
-
Thanks for amazing work to accelerate distributed training. When I use 'deepspeed train.py' to start megatron-lm train task, I get this log
![image](https://github.com/user-attachments/assets/61e6646…
-
Hi guys,
I am following the Megatron-LM example to pre-train a BERT model but I'm getting this error:
```
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/Megatron-LM/pretrai…
-
On a HP Prodesk 600 G2 SFF i had OMV 6.9-14 running, I upgraded to 7. The NIC doesn´t get an ip anymore now. If i set a static ip (192.168.1.22, 255.255.255.0, 192.168.1.1) i get a bad gateway error i…
-
- [ ] Conseguir árbitro:
- [ ] Escribir al Pinocho
- [ ] Escribir al Junior
- [x] José Saldana
- [ ] Habilitar el Madrid
- [ ] Comprar trofeos
- [ ] Confirmar los 16 equipo de la Champion…
-
**Describe the regression**
In the forks of Megatron-LM used by gpt-neox and megatron-deepspeed, MoEs are obtaining lower loss than they are in Megatron-LM with the same configuration.
**To Reprod…
-
Evaluating Llama-3-8B on DROP throws a warning with the standard configuration (3-shot), as reported in [Llama3](https://github.com/meta-llama/llama3/blob/main/eval_details.md#drop), suggesting that t…
-
Hey, great guide. Similar to others around but nicer I'd say.
Running into problems in the configure step. Any clue what's causing this? It's more or less a fresh build but I installed all the req…
Pe6r0 updated
2 weeks ago