-
python -u train_math.py --seed 10 \
--dataset_name "prealgebra" \
--dataset_path "../envs/math/data/math_500.jsonl" \
--model…
-
# 大模型post-training方法 | Rs' Log
这是一个测试样例
[http://localhost:1313/my-blog/posts/llm-post-training/](http://localhost:1313/my-blog/posts/llm-post-training/)
-
# 🧐 Problem Description
Fast-LLM lacks support for Llama 3.x models due to missing compatibility with Llama-3-style RoPE scaling. This prevents us from effectively training or using Llama 3.x check…
-
As part of Kubeflow Training V2 work, we should design and implement custom Trainer to fine-tune LLMs that we are planning to support via TrainingRuntimes in Kubeflow upstream.
We should discuss wh…
-
# URL
- https://arxiv.org/abs/2401.02038
# Authors
- Yiheng Liu
- Hao He
- Tianle Han
- Xu Zhang
- Mengyuan Liu
- Jiaming Tian
- Yutong Zhang
- Jiaqi Wang
- Xiaohui Gao
- Tianyang …
-
### System Info
- `transformers` version: 4.46.2
- Platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
- Python version: 3.10.15
- Huggingface_hub version: 0.26.2
- Safetensors version: 0.4…
-
### Description
We aim to evaluate the effectiveness of our transfer text function and the LLM-generated corrected transcript in improving the quality of training data. This analysis will focus on th…
-
仓库的代码是单卡训练的,bs设置为1,能加载模型,训练的时候OOM,如果需要多卡,直接torchrun --nproc_per_node=4 train_math.py的话在加载模型的时候就OOM
`torchtorch.OutOfMemoryError.: OutOfMemoryErrorCUDA out of memory. Tried to allocate 50.00 MiB. GPU …
-
### Problem Statement
Current LLM development is moving toward structured output. It's proved to improve model performance in various tasks. Also when training with structured output, we can explore …
-
TRT-LLM version: v0.11.0
I'm deploying a bart model with medusa heads, and i notice this issue https://github.com/NVIDIA/TensorRT-LLM/issues/1946, then i adapted my model with follow steps:
```
1…