-
### Your current environment
The output of `python collect_env.py`
```text
C:\Users\bobni\OneDrive\Desktop\Projects\p2pIssue>bash
training@Training:/mnt/c/Users/bobni/OneDrive/Desktop/Projects…
-
### Model introduction
WaveCoder 🌊 is a series of large language models (LLMs) for the coding domain, designed to solve relevant problems in the field of code through instruction-following learning. …
-
Hi everyone interested in Grok-1:
We are the ModelScope team, we trained Grok-1 HF version(https://www.modelscope.cn/models/colossalai/grok-1-pytorch/summary) with our training framework SWIFT(http…
-
现在新版本xtuner增加了dispatch后,不支持chatglm3-6b的微调了吗
File "/mnt/afs/xtuner/xtuner/model/sft.py", line 93, in __init__
dispatch_modules(self.llm, use_varlen_attn=use_varlen_attn)
File "/mnt/afs/xtuner/…
-
### 🐛 Describe the bug
i use pytorch==2.3.0 and peft to train llama3 8b , when i run my code, its raise error like:
```text
torch._amp_foreach_non_finite_check_and_unscale_(
RuntimeError:…
-
# URL
- https://arxiv.org/abs/2306.08302
# Affiliations
- Shirui Pan, N/A
- Linhao Luo, N/A
- Yufei Wang, N/A
- Chen Chen, N/A
- Jiapu Wang, N/A
- Xindong Wu, N/A
# Abstract
- Large lang…
-
# 🚀 Feature
We need one kind of AttentionBias like BlockDiagonalCausalMask, but with some optional padding.
## Motivation
When training LLM, training data may be packed. It may look like
…
-
# URL
- https://arxiv.org/abs/2305.09731
# Affiliations
- Jane Pan, N/A
- Tianyu Gao, N/A
- Howard Chen, N/A
- Danqi Chen, N/A
# Abstract
- Large language models (LLMs) exploit in-context le…
-
Hi, thanks for your great job! I want to reproduce the training process but some error occured as follows. Could you please help to have a look? Thanks!
Training scripts (I just have 4xA100, so the…
-
Hi @dirkgr! Here is a feature that would be very much desirable for decontamination, but I'm not sure how difficult it would be to implement into BFF:
The essential part of the feature would be to …