-
## 📚 Documentation
Create an example on how to train a small LLM.
Add it to the examples directory here:
https://github.com/pytorch/xla/tree/master/examples
-
### Feature request
Hi,
I have a misunderstanding regarding training LLMs. When we train the model, we calculate the loss by having the model predict the next word and then compute the difference …
-
when running `pretrain.py` with 1 or 4 GPUs and the DDPStrategy as described in the docs, I get the following error
```bash
"PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/.../torch/distribu…
-
when I run build_win.bat, I get a deepspeed whl file finnally. The matter seems to have been resolved. However, when I ran the program, the following issue occurred
File "D:\anaconda3\envs\llm\li…
-
LLM Summarizer takes standard format data and summarizes it in English language.
-
Dear authors, thanks for your insightful work! I was able to reproduce your model and my trained video-LLM can have meaningful outputs. However, since it is my first time to train a video-LLM model, I…
-
hi, for the txt image genreateion, why not try using existed LLMs, plus the decoder in tokenizer, and training the whole model with lora?
just like SEEd does.
So that it at least won't harm the …
-
出现如下报错是因为输入的文本长度超出了max_length吗?当前码仓会有自动截断的操作吗?
```
[rank0]: File "/opt/tiger/Swift-Training/swift/cli/sft.py", line 5, in
[rank0]: sft_main()
[rank0]: File "/opt/tiger/Swift-Training/swif…
-
**Context**
According to this [paper](http://proceedings.mlr.press/v139/zhao21c/zhao21c.pdf) ChatGPT (and likely other LLMs) suffer from a recency bias. Whatever class comes last has a higher propabi…
-
Training time ≈ (8* training tokens * model parameters) / (GPU counts) * GPU peek performance flops* "GPU utilization
Unfortunately, the actual time spent does not match this. Does anyone have the co…