issues
search
OpenBMB
/
BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
548
stars
74
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[BUG] KeyError: 'label' occurs when loading dataset.
#205
CearX
opened
1 week ago
0
[BUG] <HTTPError: 403 Client Error>
#204
CearX
closed
1 week ago
3
BurstAttention and Ulyless all2all support for long sequence training.
#203
MayDomine
opened
3 months ago
0
Add gc and fix typo
#202
BeingGod
opened
3 months ago
1
reduce pipe parallel memory allocation peak
#201
BeingGod
closed
4 months ago
0
dummmy test for workflow across forks
#200
MayDomine
closed
4 months ago
0
update workflow
#199
MayDomine
closed
4 months ago
0
dummpy test
#198
MayDomine
closed
4 months ago
0
[BUILD ERROR] <title>lining failed in call to always_inline '__mm512_xxx_xxx' xxxxxxxxx
#197
lk137095576
closed
3 months ago
1
Update README
#196
MayDomine
closed
3 months ago
0
add grad scale for optim_manager && fix workflow action
#195
MayDomine
closed
4 months ago
0
fix scale loss logic
#194
MayDomine
closed
5 months ago
0
[Feature] performance problem
#193
Xiang-cd
opened
5 months ago
1
Update doc and notes for BMTrain.
#192
CarryFun
closed
3 months ago
0
[WIP] Add notes for some module.
#191
CarryFun
closed
5 months ago
0
BMTrain now supports 1F1B Pipeline schedule!
#190
MayDomine
opened
6 months ago
0
remove redundant for loop
#189
Nov11
opened
6 months ago
0
Update workflow config
#188
MayDomine
closed
6 months ago
0
Update Release document
#187
MayDomine
closed
6 months ago
0
Vocab parallel Embedding impl and make example work when tp_size > 1
#186
MayDomine
closed
6 months ago
0
build workflow for PRs
#185
MayDomine
closed
6 months ago
0
Optimizer load gathered state and record delta feature are supported now
#184
MayDomine
closed
6 months ago
0
FIX: allgather_object stuck
#183
MayDomine
closed
6 months ago
0
BMTrain New Version Release v1.0.0
#182
MayDomine
closed
6 months ago
0
Fix loss scale for TP in optim_manager.py
#181
Achazwl
closed
8 months ago
0
tp cross entropy
#180
Achazwl
closed
9 months ago
0
Fix parallel_for when grain_size > 0
#179
Achazwl
closed
9 months ago
0
fix tp cross entropy
#178
Achazwl
closed
9 months ago
0
Fix tp cross entropy
#177
Achazwl
closed
10 months ago
0
Feat optim manager state
#176
Achazwl
closed
8 months ago
0
fix adam bf16 load changed to fp16
#175
Achazwl
closed
10 months ago
0
[BUG] Tensor Parallel async_chunk=4 mismatch async_chunk=1 result when sequence length longer than 16K
#174
Achazwl
opened
10 months ago
0
Fix tp
#173
zkh2016
closed
10 months ago
0
only initialized tp_comm when tp_size > 1
#172
zkh2016
closed
10 months ago
0
Async save state_dict to file
#171
zkh2016
closed
11 months ago
0
add _save_to_infer_model
#170
zkh2016
closed
11 months ago
0
[Feature] does bmtrain support torch 2.0+
#169
junphine
closed
11 months ago
1
fix async row linear not support split_input
#168
Achazwl
closed
12 months ago
0
Fix async send in pipe mode
#167
zkh2016
closed
1 year ago
0
Fix block wrapper
#166
zkh2016
closed
1 year ago
0
Faster parallel linear
#165
zkh2016
closed
1 year ago
0
[BUG] Signal killed caused by Adam Offload
#164
MayDomine
opened
1 year ago
0
New doc
#163
JerryYin777
closed
5 months ago
0
Fix cross
#162
zkh2016
closed
1 year ago
0
test_synchronize.py
#161
JerryYin777
closed
1 year ago
0
Add zero_context.py
#160
zkh2016
closed
1 year ago
0
Refactor communicate groups and Block
#159
zkh2016
closed
1 year ago
0
Pull request template
#158
MayDomine
closed
1 year ago
0
Rename class and files
#157
zkh2016
closed
1 year ago
0
Offload activation async support
#156
MayDomine
opened
1 year ago
0
Next