[Bug] Keras misconsiders the output of bias initialization method in dense layer as a scalar instead of a vector

I am using keras-mxnet==2.2.4.3 along with mxnet==1.5.1 (this issue is also tested in mxnet==1.8.0) When I am using the following lines to build a simple model containing one dense layer:

import numpy as np
import keras
import os
os.environ["KERAS_BACKEND"] = "mxnet"
from keras import layers
img_input = layers.Input(shape=(10,))
x = layers.Dense(units=1, bias_initializer='RandomUniform')(img_input)
model = keras.models.Model(img_input, x)

Keras will directly fail with the following error:

Traceback (most recent call last):
  File "try.py", line 29, in <module>
    x = layers.Dense(units=1, bias_initializer='RandomUniform')(img_input)
  File "keras/engine/base_layer.py", line 470, in __call__
    output = self.call(inputs, **kwargs)
  File "keras/layers/core.py", line 893, in call
    output = K.bias_add(output, self.bias, data_format='channels_last')
  File "keras/backend/mxnet_backend.py", line 95, in func_wrapper
    train_symbol = func(*args, **kwargs)
  File "keras/backend/mxnet_backend.py", line 3989, in bias_add
    % (len(bias_shape), x_dim))
ValueError: MXNet Backend: Unexpected bias dimensions 0, expect to be 1 or 2 dimensions

If we change the output unit to be not a value other than 1, such a problem will not appear. Then I check into the source code in detail and find that Keras misinterprete the output of sym = mx.sym.random.uniform(shape=shape, low=minval, high=maxval, dtype=dtype) as a scalar instead of a vector.

As is shown in the above figure, before entering the 601 line, the result ret is still a vector, but it seems that Keras forgets to assign the _is_vector attribute to it, since the shape of ret is (1,), Keras will consider ret to be a scalar instead of a vector. Finally, this lack of consideration will lead to the shape of bias to be 0:

And Keras will raise error assertion in line 3988.

I also test this problem in other bias initialization methods including RandomNormal and TruncatedNormal, all these three initialization method will lead to the same crash. However, when I test this issue on constant bias initialization method including Ones and Zeros, this problem will not occur. This is because keras will not use symbolic computation of mxnet and the type of bias will not be misunderstood.

As is shown above, when we set the bias using constant initialization, the type will not be assigned as KerasSymbol and therefore branch 254-255 will not be hit, so such a problem does not occur.

Base on this finding, can you add an additional guard on bias initialization methods so the (1,) dimension vector will not be misconsidered as a scalar?

awslabs / keras-apache-mxnet

[Bug] Keras misconsiders the output of bias initialization method in dense layer as a scalar instead of a vector #271