Closed thomelane closed 6 years ago
@thomelane thanks for this issue!
conclusion:
map_fn
will not help here and the lack of it is not the root cause. logic_and
operator which Keras does not have (usually simple custom metrics defined only using K.operations will be cross-backend compatible, but not this one)Following is my investigation: the original code is from https://www.kaggle.com/shaojiaxin/u-net-with-simple-resnet-blocks-v2-new-loss/notebook
In there, the author used a custom metric:
def get_iou_vector(A, B):
batch_size = A.shape[0]
metric = []
for batch in range(batch_size):
t, p = A[batch]>0, B[batch]>0
# if np.count_nonzero(t) == 0 and np.count_nonzero(p) > 0:
# metric.append(0)
# continue
# if np.count_nonzero(t) >= 1 and np.count_nonzero(p) == 0:
# metric.append(0)
# continue
# if np.count_nonzero(t) == 0 and np.count_nonzero(p) == 0:
# metric.append(1)
# continue
intersection = np.logical_and(t, p)
union = np.logical_or(t, p)
iou = (np.sum(intersection > 0) + 1e-10 )/ (np.sum(union > 0) + 1e-10)
thresholds = np.arange(0.5, 1, 0.05)
s = []
for thresh in thresholds:
s.append(iou > thresh)
metric.append(np.mean(s))
return np.mean(metric)
def my_iou_metric(label, pred):
return tf.py_func(get_iou_vector, [label, pred>0.5], tf.float64)
def my_iou_metric_2(label, pred):
return tf.py_func(get_iou_vector, [label, pred >0], tf.float64)
Note he used tf.py_func
to wrap around get_iou_vector
, what tf.py_func
does is it converts a python function to tensorflow operator. get_iou_vector
takes numpy array inputs and return numpy array inputs. after wrapping with tf.py_func
, it takes Tensors and returns Tensors. Keras metrics only accepts and returns Tensors. Normally simple custom metric defiend with K.operations, will take Tensors and return Tensors, so they will be cross-backends compatible.
Because there is no K.logic_and
and K.logic_or
in Keras (not with any backends), so he has to use numpy operations in get_iou_vector
and wrap it with tf.py_func
, so inputs and outputs became Tensors.
However this won't work with MXNet because MXNet does not have something like mx.py_func, even with that it won't work because MXNet only takes and returns ndarray or symbols. When Keras use Tensors, TensorFlow tensors (what py_func returns) is natively supported. MXNet has to wrap everything using a KerasSymbol class.
Also, K.map_fn will not work here because it takes and returns Tensors, while we need numpy.logic_and in the metric function.
Solution
Give that, there is still a solution, because mxnet does have logic_and and logic_or. We just need to create a new custom operator which is equavalent to get_iou_vector
and do everything in mxnet symbol instead of numpy.
In keras/backends/mxnet_backend.py
, add the following operator:
@keras_mxnet_symbol
def get_iou_vector_mx(A, B):
"""
t = K.greater(A, 0)
p = K.greater(B, 0)
intersection = logical_and(t, p)
union = logical_or(t, p)
iou = (np.sum(K.greater(intersection,0) + 1e-10) / (np.sum(union > 0) + 1e-10)
"""
def step(data, _):
zero = mx.sym.zeros((1))
t = mx.sym.broadcast_greater(data[0], zero)
p = mx.sym.broadcast_greater(data[1], zero)
intersection = mx.sym.broadcast_logical_and(t, p)
union = mx.sym.broadcast_logical_or(t, p)
iou = (mx.sym.sum(mx.sym.broadcast_greater(intersection, zero)) + mx.sym.full((1), 1e-10))/ \
(mx.sym.sum(mx.sym.broadcast_greater(union, zero)) + mx.sym.full((1), 1e-10))
thresholds = mx.sym.arange(0.5, 1 , 0.05)
return mx.sym.mean(mx.sym.broadcast_greater(iou, thresholds)), _
data = [A.symbol, B.symbol]
output, _ = mx.sym.contrib.foreach(step, data, [])
return KerasSymbol(mx.sym.mean(output))
and use it in training like:
def my_iou_metric_mx(label, pred):
return K.get_iou_vector_mx(label, pred > 0.5)
now you can compile with custom metric and train:
model1.compile(loss="binary_crossentropy", optimizer=c, metrics=[my_iou_metric_mx])
model1.summary()
reduce_lr = ReduceLROnPlateau(monitor='my_iou_metric_mx', mode = 'max',factor=0.5, patience=5, min_lr=0.0001, verbose=1)
epochs = 50
batch_size = 32
history = model1.fit(x_train, y_train,
validation_data=[x_valid, y_valid],
epochs=epochs,
batch_size=batch_size,
callbacks=[reduce_lr],
verbose=2)
I was able to train some epochs:
Train on 6400 samples, validate on 800 samples
Epoch 1/50
/usr/local/lib/python2.7/dist-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optim
izer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.03125). Is this intended?
force_init=force_init)
- 739s - loss: 0.5648 - my_iou_metric_mx: 0.3057 - val_loss: 0.5721 - val_my_iou_metric_mx: 0.
3900
Epoch 2/50
- 721s - loss: 0.5586 - my_iou_metric_mx: 0.3873 - val_loss: 0.5660 - val_my_iou_metric_mx: 0.
3900
Epoch 3/50
- 721s - loss: 0.5585 - my_iou_metric_mx: 0.3730 - val_loss: 0.5883 - val_my_iou_metric_mx: 0.
3900
Epoch 4/50
- 721s - loss: 0.5577 - my_iou_metric_mx: 0.3906 - val_loss: 0.5764 - val_my_iou_metric_mx: 0.
3900
Epoch 5/50
- 723s - loss: 0.5583 - my_iou_metric_mx: 0.3900 - val_loss: 0.9355 - val_my_iou_metric_mx: 0.
3900
Thanks @roywei 's solution worked!
Custom metric for IOU is implemented using Keras backend functions, but the
map_fn
operator is not implemented for MXNet Backend.See https://discuss.mxnet.io/t/keras-mxnet-custommetric/1952 for more details.