Closed experiencor closed 7 years ago
@experiencor Did you run .compile
? Could you post the print out of the model? Run this:
from keras_exp.multigpu import print_mgpu_modelsummary
# your code above
print_mgpu_modelsummary(mgpu_model)
I wasn't able to build the model just off the code you posted. There's a missing function and parameters that I'm not sure where they come from.
Parameters Unspecified:
FRAME_H, FRAME_W, latent_dim, epsilon_std, per_w, kbd_w, lv1_w, lv2_w, lv3_w, lv4_w, lv5_w
Function Not Defined:
make_vgg16
The other parameters/layers I figured out:
from keras.models import Model
from keras.layers import (
Input, Lambda, Dense, Reshape, Deconv2D, BatchNormalization, LeakyReLU,
Activation, Flatten)
import keras.backend as K
import tensorflow as tf
Ideally if you can share a working single GPU slimmed down example I'll debug it and try to run it with multiple GPUs. If there's proprietary code you don't want to share then at least please post the output of print_mgpu_modelsummary(mgpu_model)
so I can see the layers and dimensions.
Maybe there's a bug in how I'm slicing and concatenating with multi-inputs, but I need more info.
This is due to a bug in my model construction code.
Hi @avolkov1
I encountered the following error when training a model with multiple inputs and custom loss:
InvalidArgumentError: Incompatible shapes: [8] vs. [16] [[Node: tower_1/model_4/lambda_3/add_6 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:1"](tower_1/model_4/lambda_3/add_5, tower_1/model_4/lambda_3/mul_12)]] [[Node: loss/mul/_1069 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9575_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
The BATCH SIZE is 16. The sliced batch size is 8 for each of the 2 GPUs. The code of the model is:
This is the code to make the multi-GPU mode:
I encountered this error when I run
mgpu_model.fit_generator
. Can you give me some pointers on how to fix this problem? Thanks in advance.