-
# URL
- https://arxiv.org/abs/2401.13601
# Affiliations
- Duzhen Zhang, N/A
- Yahan Yu, N/A
- Chenxing Li, N/A
- Jiahua Dong, N/A
- Dan Su, N/A
- Chenhui Chu, N/A
- Dong Yu, N/A
# Abstra…
-
### System Info
CPU:X_86_64
GPU: A10
OS: Ubuntu 22.04
### Who can help?
@Tracin @byshiue please help.
### Information
- [X] The official example scripts
- [ ] My own modified script…
-
### System Info
```shell
accelerate 1.1.1
neuronx-cc 2.14.227.0+2d4f85be
neuronx-distributed 0.8.0
neuronx-distributed-training 1.0.0
optimum …
-
Hi,
I would like to test a program for distributed LLM model training on mi2508x and I want to do model parallel to distribute parameters across GPUs. Is there any framework that I should use to ac…
-
您好!我是哈尔滨工业大学的一名学生,最近正尝试复刻您关于CoGenesis的工作。我遇到了一些棘手的麻烦,希望得到您的帮助。
问题如下:在“基于草稿的方法”下,加载微调过的小模型时出现了如下的报错:**ValueError: Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape t…
-
### 🐛 Describe the bug
I attempted to train LLaVA (base LLM = LLaMA 3) using the Liger kernel. The loss graph was similar to when I trained LLaVA without the Liger kernel. However, the model traine…
-
Awesome project, I thank you for expanding the Go ecosystem in this area and as per our Reddit discussion I'm putting this down here.
It would be amazing to have real-world/practical training sets/…
-
I've been trying to use DSPy in different contexts where I see fit, but I've been unsuccessful in obtaining any good results. I have a very long prompt for a classification task that needs to describe…
-
have a try to reproduce Nvidia's results on using slurm + enroot + pyxis
1. downgrade the transformers and huggingface_hub libs (huggingface_hub==0.23.2 transformers==4.40.2) because the versions…
-
### Description
Hi, I was recently guilty of filling up the disk space on our cluster because my wandb artifact cache had grown to 1.6TB over the last month alone while I was experimenting with my LL…