issues
search
chainer
/
chainermn
ChainerMN: Scalable distributed deep learning with Chainer
https://chainer.org
MIT License
207
stars
57
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Added an FAQ entry about MPI hang issue.
#249
keisukefukuda
closed
6 years ago
3
Fix MultiNodeIterator for paired datasets
#248
levelfour
closed
6 years ago
3
Dummy PR
#247
shu65
closed
6 years ago
0
Fix tests of MultiNodeIterator
#246
shu65
closed
6 years ago
0
[WIP] Revert multi-node iterator
#245
kuenishi
closed
6 years ago
1
[wip] Eliminate mpi4py's ssend to fix tests
#244
kuenishi
closed
6 years ago
0
Fix p2p-communication test
#243
kuenishi
closed
6 years ago
0
Re: Add collective communications
#242
keisukefukuda
closed
6 years ago
0
Asynchronous Allreduce
#241
fengyuan14
closed
6 years ago
2
Remove unused nccl comm and mpi comm
#240
shu65
closed
6 years ago
1
Fix PR235
#239
shu65
closed
6 years ago
1
Update supported Chainer versions
#238
kuenishi
closed
6 years ago
0
Add allreduce method to communicator interface with implementation
#237
kuenishi
closed
6 years ago
0
mpirun doesn't exit when exception is thrown in some process
#236
andremoeller
closed
6 years ago
7
Expose CommunicatorBase as communicator interface with docs
#235
kuenishi
closed
6 years ago
2
Adding allreduce for ndarray
#234
Hakuyume
closed
6 years ago
10
Fix a bug of NStepRNN
#233
shu65
closed
6 years ago
0
Clean up Communicator interface with changes
#232
kuenishi
closed
6 years ago
1
Replace get_device
#231
shu65
closed
6 years ago
4
[WIP] Refactor Communicators
#230
shu65
closed
6 years ago
0
Fix bcast
#229
levelfour
closed
6 years ago
1
A dummy PR
#228
keisukefukuda
closed
6 years ago
0
PR for test scripts debug
#227
shu65
closed
6 years ago
0
Add collective communications
#226
levelfour
closed
6 years ago
2
Checkpointer doesn't resume current learning rate
#225
Guriido
closed
6 years ago
8
Don't inicialize global NCCL comm when
#224
undertherain
closed
6 years ago
2
Update chainer version 4.0.0rc1 / 3.5
#223
keisukefukuda
closed
6 years ago
0
Fix MultiNodeNStepRNN to use Chainer n_cells
#222
levelfour
closed
6 years ago
0
ChainerMN hangs with Open MPI 3
#221
keisukefukuda
closed
5 years ago
1
Update tested Chainer versions
#220
kuenishi
closed
6 years ago
1
Test the combination of MutliNodeIterator and MultiprocessIterator
#219
levelfour
closed
6 years ago
1
chainermn fails on >232 threads with NCCL_ERROR_SYSTEM_ERROR
#218
undertherain
closed
6 years ago
7
Multi-GPU training hangs
#217
andremoeller
closed
6 years ago
14
Optimize PureNcclCommunicator to accelerate training with double buffering
#216
shu65
closed
6 years ago
3
[WIP] Warn mp start method
#215
keisukefukuda
closed
6 years ago
2
Fix send to avoid deadlock without inputs does not reqires grad
#214
levelfour
closed
6 years ago
1
Check contiguousness of outgoing arrays
#213
levelfour
closed
6 years ago
1
Removed v1 models
#212
keisukefukuda
closed
6 years ago
0
Print warning if inappropriate `start_method` of multiprocessing is used
#211
keisukefukuda
opened
6 years ago
0
Guidance on MVAPICH vs OpenMPI
#210
andremoeller
closed
6 years ago
1
Running chainermn scripts without mpirun / mpiexec
#209
andremoeller
closed
6 years ago
8
Implementation choice of scatter_dataset function
#208
Guriido
closed
6 years ago
2
[WIP] fix deadlock in unit tests
#207
keisukefukuda
closed
6 years ago
0
We don't need `models_v1` in ImageNet examples now
#206
iwiwi
closed
6 years ago
0
[WIP] Fix broken n_step_rnn with Chainer's master
#205
keisukefukuda
closed
6 years ago
2
Cannot use other start method for multiprocessing
#204
Guriido
opened
6 years ago
11
Port Chainer#4191 or use Chainer's BN implementation
#203
kuenishi
opened
6 years ago
2
Delete files related to cython
#202
shu65
closed
6 years ago
0
Fix bugs in DoubleBufferingOptimizer and PureNcclCommunicator
#201
shu65
closed
6 years ago
1
Bump library versions, esp. Chainer to 3.3
#200
kuenishi
closed
6 years ago
0
Previous
Next