Multiple GPU issue - Githubissues

hnhuang commented 7 years ago

Hi,

My model can run on a single GPU, but it failed on multiple GPU. Here is my code:

x_train, y_train = batch_reader.get_batch() gpu_list = ["gpu(0)", "gpu(1)", "gpu(2)", "gpu(3)"] model_dist.compile(loss=losses.dist_loss_cls(C.max_radius), optimizer=optimizer, context=gpu_list) model_dist.fit(x_train, y_train, batch_size=20, nb_epoch = num_epochs, callbacks=[checkpoint_fixed_name])

The error I got was:

RuntimeError: simple_bind error. Arguments: input_1: (5, 1L, 32L, 32L, 32L) [13:36:31] src/storage/storage.cc:59: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid device ordinal

Would anyone please help me? Thanks.

sandeep-krishnamurthy commented 7 years ago

Issue seems to be that you don't have that many GPUs. May be you could run - "nvidia-smi" command on terminal and report if you have 4 GPUs?

hnhuang commented 7 years ago

I do have 4 GPUs.

sandeep-krishnamurthy commented 7 years ago

I tried Resnet50 example here - https://github.com/dmlc/keras/blob/master/examples/cifar10_resnet50.py with multiple GPUs and things seems to work fine. Can you please let me know more details on the setup you have, version of MXNet, any CUDA specific environment variables set, code you are using.

dmlc / keras

Multiple GPU issue #76