issues
search
OpenBMB
/
BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
560
stars
77
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
cuda118 python2.0.0 C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.35.32215\\bin\\HostX86\\x64\\cl.exe下,安装报错
#106
acbogeh
closed
1 year ago
5
麻烦尽快适配CUDA12
#105
acbogeh
closed
1 year ago
1
在V100 上安装失败
#104
uloveqian2021
closed
1 year ago
0
bmtrain编译不过去啊,说是nccl问题,我是window环境
#103
zhangweia
closed
1 year ago
5
import 报错
#102
janglichao
closed
1 year ago
7
Would you publish the performance data in detail about how to save 90%?
#101
Desein-Yang
opened
1 year ago
0
[Install Error] CUDA 12.1 mismatch Pytorch
#100
SwartzMss
closed
1 year ago
3
cuda extention添加的算子不能用bmtrain?
#99
westnight
closed
1 year ago
1
Does BMTrain 0.2.0 support cuda 11.1?
#98
FutureForMe
closed
1 year ago
1
Can BMTrain work with Megatron-LM?
#97
marscrazy
closed
1 year ago
1
【Error】 in Adam implementation
#96
marscrazy
closed
1 year ago
2
[replace usage of tensor.storage()]
#95
Oran-Ac
opened
1 year ago
0
[问题]bf16 & pipeline parallel
#94
ftgreat
closed
1 year ago
8
[问题]优化器状态加载
#93
ftgreat
closed
1 year ago
8
support multiple input-output in transformerblocklist
#92
Achazwl
closed
1 year ago
0
[FeatureRequest]`bmt.OpTransformerBlockList` **DO NOT** support multiple return values of transformer block's forward propogation
#91
eggiter
closed
1 year ago
1
fix inspector grad when tensor is not recorded in some layer
#90
Achazwl
closed
1 year ago
0
怎么在使用bmtrain训练的时候读取已经训练好的增量微调的权重??
#89
liweiqing1997
closed
1 year ago
1
BMtrain insatll failed which my environment is gcc5.4.0, torch1.7.0, cudnn10.2. I have try other torch version, for example 1.12.0, filed again.
#88
rongzhenlee
closed
1 year ago
1
BUG:TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType
#87
pilipala818
closed
1 year ago
1
Model extensibility
#86
pilipala818
closed
1 year ago
2
can't run the example of BMTrain's implementation of GPT-2.
#85
tqjack
closed
1 year ago
1
stuck during synchronize
#84
Smu-Tan
closed
1 year ago
4
Install error
#83
zhhongzhi
closed
1 year ago
4
What can I do to handle the overflow?
#82
lhj-git
closed
1 year ago
1
请问单机单卡运行,需要修改什么地方呢?
#81
qinqinqaq
closed
1 year ago
21
Failed to install BMTrain: ~/has_inf_nan.cu(11): error: identifier "__heq" is undefined
#80
lindylin1817
closed
1 year ago
3
bf16 optimizer
#79
ftgreat
closed
1 year ago
1
fix: make load stream wait default stream after init_parameters
#78
Achazwl
closed
1 year ago
0
temparary fix of bmtrain+opendelta load state dict
#77
Achazwl
closed
1 year ago
2
禁用ZeRO Optimization
#76
yiye3
closed
1 year ago
1
安装成功,但是导入错误
#75
Mryangkaitong
closed
1 year ago
4
安装失败
#74
Mryangkaitong
closed
1 year ago
6
BMTrain安装失败?FAILED: /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/cuda/adam.o
#73
myf-algorithm
closed
1 year ago
4
training script is stuck in bmt.init_distributed(seed=0)
#72
greenteaofwhu
closed
1 year ago
4
install error on windows10
#71
havocio
closed
1 year ago
2
fix run bmtrain with one gpu without torchrun
#70
Achazwl
closed
1 year ago
0
Undo a deletion of detach in previous version
#69
Achazwl
closed
1 year ago
0
avoid empty state when justify scale
#68
Achazwl
closed
1 year ago
0
How can I apply checkpoint block on cpm-1?
#67
lhj-git
closed
1 year ago
3
Segment fault when the code ends
#66
lhj-git
closed
1 year ago
3
TypeError: object of type 'TransformerBlockList' has no len()
#65
etale-cohomology
closed
1 year ago
2
fix output shape mismatch after CheckpointBlock
#64
Achazwl
closed
1 year ago
0
About CPU Offloading
#63
lhj-git
closed
1 year ago
4
如果一台机器上的显存不够加大载模型时,是否将加载到其他机器上?
#62
bucm-tcm-tool
closed
1 year ago
3
add test for grad accumulation and state_dict interface.
#61
MayDomine
closed
1 year ago
0
fix inspect grad mean/std from None to 0
#60
Achazwl
closed
1 year ago
0
Create UPDATE_0.2.0.md
#59
a710128
closed
1 year ago
0
ERROR: Command errored out with exit status 1:
#58
wccccp
closed
1 year ago
1
Some save and load problems when incorporated with BMCook
#57
isuco
opened
1 year ago
0
Previous
Next