issues
search
IIC-SIG-MLsys
/
HDDT
Distrubuted DNN Training on Heterogeneous GPUs
0
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
add HddtMemory
#40
072020127
closed
3 days ago
0
coll algo
#39
derekwin
opened
6 days ago
0
mpi app bug
#38
derekwin
opened
6 days ago
1
add hddt memory
#37
072020127
closed
4 days ago
1
[+]collective support
#36
derekwin
opened
1 week ago
0
Message Synchronization Protocol
#35
derekwin
opened
1 week ago
0
improve communicator & coll
#34
derekwin
closed
1 week ago
0
Add Collective abstract
#33
derekwin
opened
1 week ago
2
add python binding example
#32
derekwin
closed
1 week ago
2
add pybind
#31
derekwin
opened
1 week ago
2
Add support for Huawei
#30
derekwin
opened
1 week ago
0
Add support for Cambricon MLU
#29
derekwin
closed
1 week ago
1
[~]fetch mnist by bash script
#28
derekwin
closed
1 week ago
0
Add support to MLU
#27
derekwin
closed
1 week ago
2
issue 15
#26
ForestWisdom
closed
1 week ago
0
issue 15
#25
ForestWisdom
closed
2 weeks ago
0
Add Communicator Abstract to HDDT
#24
derekwin
closed
1 week ago
1
"[+]log; mem(host,cuda);[~]driver"
#23
derekwin
closed
2 weeks ago
0
Revert "add memory abstract, add log, update driver init logic."
#22
derekwin
closed
2 weeks ago
0
add memory abstract, add log, update driver init logic.
#21
derekwin
closed
2 weeks ago
1
Create fetch_mnist.cmake
#20
Firefly-Dance
closed
2 weeks ago
0
添加了mnist数据集要求,not tested yet
#19
Firefly-Dance
closed
2 weeks ago
0
[~]update train_app
#18
derekwin
closed
3 weeks ago
0
[+]gpu driver init; log print;[~]change lib link dependence
#17
derekwin
closed
3 weeks ago
1
添加了mnist的训练代码,并且添加了训练数据
#16
Firefly-Dance
closed
3 weeks ago
0
Error while build torch_app on hygon GPU.
#15
derekwin
closed
1 week ago
2
[~]fix hip
#14
derekwin
closed
3 weeks ago
1
[+]add howToDev.md
#13
derekwin
closed
3 weeks ago
0
[+]torchapi integration; torch_app
#12
derekwin
closed
3 weeks ago
0
Add communicator abstract
#11
derekwin
closed
1 week ago
3
Heterogeneous memory manager
#10
derekwin
opened
1 month ago
1
Improve the simple_inference application
#9
derekwin
opened
1 month ago
2
add a document for the development
#8
derekwin
closed
3 weeks ago
1
Async event executor support
#7
derekwin
opened
1 month ago
0
support rdma-based p2p tranport
#6
derekwin
closed
1 week ago
3
fix hip build bug
#5
derekwin
closed
3 weeks ago
2
[+]new app simple_inference
#4
derekwin
closed
1 month ago
0
openmpi-based distributed application
#3
derekwin
closed
1 month ago
2
torch api integration
#2
derekwin
closed
1 week ago
4
init
#1
derekwin
closed
1 month ago
0