issues
search
bytedance
/
byteps
A high performance and generic framework for distributed DNN training
Other
3.63k
stars
488
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
install failed
#447
themoonstone
opened
7 months ago
0
支持的cuda和pytorch版本
#446
themoonstone
opened
7 months ago
0
support pytorch 2.1.x
#445
rainj-me
closed
6 months ago
0
Is there any benchmark comparison with Megatron-LM ?
#444
sequoiar
opened
1 year ago
0
segmentation fault while launching the worker
#443
xuexiaxie
opened
1 year ago
1
How does the tensorflow scheduler plugin used in the tf_benchmark_cnn.py
#442
sxqqslf
opened
1 year ago
1
Mistakes of Workload calculation
#441
fly-dragon211
opened
1 year ago
5
安装问题
#440
QingQingR
opened
2 years ago
0
Supported environment
#439
QingQingR
closed
2 years ago
0
broadcast and is_initialized api are not supported with pytorch.
#438
HangJie720
opened
2 years ago
0
support for fault tolerance and straggler mitigation
#437
youshaox
opened
2 years ago
0
Communication failure in MXNet with BytePS
#436
qingyangDuan
closed
2 years ago
3
3rdparty: update pslite to fix shm name
#435
ymjiang
closed
2 years ago
0
update shm naming scheme
#434
pleasantrabbit
opened
2 years ago
0
安装报错
#433
llplay
opened
2 years ago
1
torch: update ddp
#432
pleasantrabbit
opened
2 years ago
0
Release BytePS docker image support for TF2
#431
shaowei-su
opened
2 years ago
0
Running multiple workers on a single GPU machine
#430
hamidralmasi
opened
2 years ago
0
launcher: join workers as they exit
#429
pleasantrabbit
closed
2 years ago
0
Successfully installed BytePS but cannot import byteps.torch or byteps.tensorflow
#428
hamidralmasi
closed
2 years ago
2
benchmark with cross barrier error
#427
panpanli521
opened
2 years ago
0
有计划支持纯cpu吗?我们worker也用cpu机器的
#426
starkeisntein
opened
2 years ago
2
啥时候支持sparse模型?
#425
starkeisntein
opened
2 years ago
0
ps-lite: disable ucx error handling by default
#424
pleasantrabbit
closed
2 years ago
0
ps-lite: update ps-lite
#423
pleasantrabbit
closed
2 years ago
0
Is it right to do allreduce immediately for non-zero ranks in bytescheduler?
#422
sywang0111
closed
2 years ago
2
server: exit log improvement
#421
ymjiang
closed
2 years ago
0
torch: fix compression when using apex.amp
#420
pleasantrabbit
closed
2 years ago
0
Stuck in the bps.init().
#419
Fangjin98
closed
2 years ago
7
The byteps in K8S Pod doesn't have DMLC_WORKER_ID configured.
#418
jackjinj
opened
2 years ago
0
How to use gradient accumulate in BytePS torch DDP?
#417
wuyujiji
opened
3 years ago
5
tensorflow: fix bug in broadcast_variables
#416
pleasantrabbit
closed
3 years ago
0
build: update ucx tarball download logic
#415
pleasantrabbit
closed
3 years ago
0
common: add better support for huge tensors
#414
ymjiang
closed
3 years ago
0
packaging: download tarballs when running sdist
#413
pleasantrabbit
closed
3 years ago
0
server: improve thread safety
#412
ymjiang
closed
3 years ago
0
Training process occurs nan at the first ten batch.
#411
powermano
opened
3 years ago
2
pr 363
#410
pleasantrabbit
closed
3 years ago
0
Update ps lite
#409
pleasantrabbit
closed
3 years ago
2
Did BytePS Support multiple NICs now?
#408
wuyujiji
opened
3 years ago
13
update doc for core affinity envs
#407
pleasantrabbit
closed
3 years ago
0
update core binding policy
#406
pleasantrabbit
closed
3 years ago
0
docker file for bytescheduler does not work
#405
zarzen
closed
3 years ago
7
Does TensorFlow1x support asycn-training?
#404
jiahuiyang
opened
3 years ago
2
subprocess.CalledProcessError returned non-zero exit status 132
#403
powermano
closed
3 years ago
2
TensorFlow 2.4+ compatibility
#402
oliverhu
closed
1 year ago
0
TensorFlow 2.5 compatibility
#401
oliverhu
opened
3 years ago
1
the share memory optimization of RDMA in single machine
#400
wuyujiji
opened
3 years ago
3
fix bool env, disable avx512
#399
pleasantrabbit
closed
3 years ago
0
Giving the error munmap_chunk(): invalid pointer in BytePS when DMLC_NUM_WORKER changed from 1 to 2
#398
udaykiran009
opened
3 years ago
1
Next