-
Hello, I'm currently trying to reproduce NVIDIA Llama2 70B results on DGX H100. I applied fixes from https://github.com/mlcommons/training_results_v4.0/issues/5, but face the CUDA issue:
```
Failed:…
-
Tracker issue for adding [LayerSkip](https://arxiv.org/abs/2404.16710) to AO.
This is a training and inference optimization that is similar to layer-wise pruning. It's particularly interesting for…
jcaip updated
3 months ago
-
您好!我是哈尔滨工业大学的一名学生,最近正尝试复刻您关于CoGenesis的工作。我遇到了一些棘手的麻烦,希望得到您的帮助。
问题如下:在“基于草稿的方法”下,加载微调过的小模型时出现了如下的报错:**ValueError: Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape t…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
(MindSpore) [root@fd428729b7cb46b089e3705e66eecb16-task0-0 LLaMA-Factory]# llamafactory-cli train example…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
hi there,
> vllm version: 0.4.1
I fine-tuned the mistral-7b-v0.2 model using the tr…
-
The GUI is not working.
(gpt1) ashu@MSI:/mnt/c/Users/genco/Documents/gpt$ make run
poetry run python -m private_gpt
11:47:15.734 [INFO ] private_gpt.settings.settings_loader - Starting appli…
-
### What happened?
I am running on Rocm with 4 x Instinct MI100.
Only when using `--split-mode row` mode I get a Address boundary error.
llama.cpp was working when I had a XGMI GPU Bridge working w…
-
### What is the issue?
No issues with any model that fits into a single 3090 but seems to run out of memory when trying to distribute to the second 3090.
```
INFO [wmain] starting c++ runner | ti…
-
Hi, I found that there exist several errors in the pre-training code (the file run.sh) and corresponding code. I have mentioned one in the pull request.Furthermore, it seems that we should use $PATH_T…
-
This issue concerns the dataset description at https://github.com/awslabs/open-data-registry/blob/main/datasets/software-heritage.yaml
Due to the recent rise in the quest of data for LLM training, we…