awslabs / keras-apache-mxnet

[DEPRECATED] Amazon Deep Learning's Keras with Apache MXNet support
https://github.com/awslabs/keras-apache-mxnet/wiki
Other
290 stars 65 forks source link

NotImplementedError: MXNet Backend: map_fn operator is not supported yet. #191

Closed thomelane closed 6 years ago

thomelane commented 6 years ago

Custom metric for IOU is implemented using Keras backend functions, but the map_fn operator is not implemented for MXNet Backend.

s = K.map_fn(lambda thresh: iou > thresh, thresholds)

See https://discuss.mxnet.io/t/keras-mxnet-custommetric/1952 for more details.

roywei commented 6 years ago

@thomelane thanks for this issue!

conclusion:

  1. map_fn will not help here and the lack of it is not the root cause.
  2. The original custom metric function is only going to work with TF backend, not with any other backends. There is no way to write a crossbackend compatible custom metric function that is going to work for all backends with a switch in the config file. Because it requires logic_and operator which Keras does not have (usually simple custom metrics defined only using K.operations will be cross-backend compatible, but not this one)
  3. we need a custom operator in mxnet backend in order to do this.

Following is my investigation: the original code is from https://www.kaggle.com/shaojiaxin/u-net-with-simple-resnet-blocks-v2-new-loss/notebook

In there, the author used a custom metric:

def get_iou_vector(A, B):
    batch_size = A.shape[0]
    metric = []
    for batch in range(batch_size):
        t, p = A[batch]>0, B[batch]>0
#         if np.count_nonzero(t) == 0 and np.count_nonzero(p) > 0:
#             metric.append(0)
#             continue
#         if np.count_nonzero(t) >= 1 and np.count_nonzero(p) == 0:
#             metric.append(0)
#             continue
#         if np.count_nonzero(t) == 0 and np.count_nonzero(p) == 0:
#             metric.append(1)
#             continue

        intersection = np.logical_and(t, p)
        union = np.logical_or(t, p)
        iou = (np.sum(intersection > 0) + 1e-10 )/ (np.sum(union > 0) + 1e-10)
        thresholds = np.arange(0.5, 1, 0.05)
        s = []
        for thresh in thresholds:
            s.append(iou > thresh)
        metric.append(np.mean(s))

    return np.mean(metric)

def my_iou_metric(label, pred):
    return tf.py_func(get_iou_vector, [label, pred>0.5], tf.float64)

def my_iou_metric_2(label, pred):
    return tf.py_func(get_iou_vector, [label, pred >0], tf.float64)

Note he used tf.py_func to wrap around get_iou_vector, what tf.py_func does is it converts a python function to tensorflow operator. get_iou_vector takes numpy array inputs and return numpy array inputs. after wrapping with tf.py_func, it takes Tensors and returns Tensors. Keras metrics only accepts and returns Tensors. Normally simple custom metric defiend with K.operations, will take Tensors and return Tensors, so they will be cross-backends compatible.

Because there is no K.logic_and and K.logic_or in Keras (not with any backends), so he has to use numpy operations in get_iou_vector and wrap it with tf.py_func, so inputs and outputs became Tensors.

However this won't work with MXNet because MXNet does not have something like mx.py_func, even with that it won't work because MXNet only takes and returns ndarray or symbols. When Keras use Tensors, TensorFlow tensors (what py_func returns) is natively supported. MXNet has to wrap everything using a KerasSymbol class.

Also, K.map_fn will not work here because it takes and returns Tensors, while we need numpy.logic_and in the metric function.

Solution Give that, there is still a solution, because mxnet does have logic_and and logic_or. We just need to create a new custom operator which is equavalent to get_iou_vector and do everything in mxnet symbol instead of numpy.

In keras/backends/mxnet_backend.py, add the following operator:

@keras_mxnet_symbol
def get_iou_vector_mx(A, B):
    """
    t = K.greater(A, 0)
    p = K.greater(B, 0)
    intersection = logical_and(t, p)
    union = logical_or(t, p)
    iou =  (np.sum(K.greater(intersection,0) + 1e-10) / (np.sum(union > 0) + 1e-10)
    """
    def step(data, _):
        zero = mx.sym.zeros((1))
        t = mx.sym.broadcast_greater(data[0], zero)
        p = mx.sym.broadcast_greater(data[1], zero)
        intersection = mx.sym.broadcast_logical_and(t, p)
        union = mx.sym.broadcast_logical_or(t, p)
        iou = (mx.sym.sum(mx.sym.broadcast_greater(intersection, zero)) + mx.sym.full((1), 1e-10))/ \
            (mx.sym.sum(mx.sym.broadcast_greater(union, zero)) + mx.sym.full((1), 1e-10))
        thresholds = mx.sym.arange(0.5, 1 , 0.05)
        return mx.sym.mean(mx.sym.broadcast_greater(iou, thresholds)), _
    data = [A.symbol, B.symbol]
    output, _  = mx.sym.contrib.foreach(step, data, [])
    return KerasSymbol(mx.sym.mean(output))

and use it in training like:

def my_iou_metric_mx(label, pred):
    return K.get_iou_vector_mx(label, pred > 0.5)

now you can compile with custom metric and train:

model1.compile(loss="binary_crossentropy", optimizer=c, metrics=[my_iou_metric_mx])
model1.summary()
reduce_lr = ReduceLROnPlateau(monitor='my_iou_metric_mx', mode = 'max',factor=0.5, patience=5, min_lr=0.0001, verbose=1)

epochs = 50
batch_size = 32
history = model1.fit(x_train, y_train,
                    validation_data=[x_valid, y_valid],
                    epochs=epochs,
                    batch_size=batch_size,
                    callbacks=[reduce_lr],
                    verbose=2)

I was able to train some epochs:

Train on 6400 samples, validate on 800 samples
Epoch 1/50
/usr/local/lib/python2.7/dist-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optim
izer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.03125). Is this intended?
  force_init=force_init)

 - 739s - loss: 0.5648 - my_iou_metric_mx: 0.3057 - val_loss: 0.5721 - val_my_iou_metric_mx: 0.
3900
Epoch 2/50
 - 721s - loss: 0.5586 - my_iou_metric_mx: 0.3873 - val_loss: 0.5660 - val_my_iou_metric_mx: 0.
3900
Epoch 3/50
 - 721s - loss: 0.5585 - my_iou_metric_mx: 0.3730 - val_loss: 0.5883 - val_my_iou_metric_mx: 0.
3900
Epoch 4/50
 - 721s - loss: 0.5577 - my_iou_metric_mx: 0.3906 - val_loss: 0.5764 - val_my_iou_metric_mx: 0.
3900
Epoch 5/50
 - 723s - loss: 0.5583 - my_iou_metric_mx: 0.3900 - val_loss: 0.9355 - val_my_iou_metric_mx: 0.
3900
Cpruce commented 6 years ago

Thanks @roywei 's solution worked!