Class predicted always 0

WeiTang114 / MVCNN-TensorFlow

An Multi-View CNN (MVCNN) implementation with TensorFlow.

MIT License

122 stars 67 forks source link

Class predicted always 0 #14

Open Sypotu opened 7 years ago

Sypotu commented 7 years ago

Hi,

I am training the network initialized with _alexnetimagenet.npy using the rendered views furnished by @WeiTang114 (https://drive.google.com/open?id=0B4v2jR3WsindMUE3N2xiLVpyLW8). The only thing I have changed is to reduce the batch size from 16 to 8 because my GPU doesn't have enough memory.

But quite quickly (after 300-400 steps) the network get stuck classifying all the inputs as class 0. (accuracy corresponding to random guess) Can this be due to the reduction of the batch size or there is another reason?

Thank you for your help!

rlczddl commented 7 years ago

I also set batch_size to 8 because of my pc,and I also meet your problem.Have you find the cause?

rlczddl commented 7 years ago

set batchsize to16, also.

rlczddl commented 7 years ago

not the reason ofbatchsize，because i have trained on another better pc，and prediction is also 0.

rlczddl commented 7 years ago

maybe the reason of his fc8 using relu.

Sypotu commented 7 years ago

Yes i have removed the relu after fc8 and it's working fine now

rlczddl commented 7 years ago

have you made other changes?i remove relu layer after fc8,and set substract_mean is true because i use another dataset,but prediction are always same(not only 0,but alos other numbers)

Sypotu commented 7 years ago

Are they the same from the first step or do they slowly converge to always same values?

rlczddl commented 7 years ago

(In first picture,the first list [122,18,69,100,38...] is the real label of validation data,and the second line [116,116,116,116...] is the predictions on validation data.The predictions on test data is also like this ) This is my training process,as you can see,the predictions on validation data are always same,but they are different every time of validate.

Sypotu commented 7 years ago

How did you change the network to remove the relu ? I added this :

def _fc_norelu(name, in_, outsize, reuse=False):
    with tf.variable_scope(name, reuse=reuse) as scope:
        # Move everything into depth so we can perform a single matrix multiply.

        insize = in_.get_shape().as_list()[-1]
        weights = _variable_with_weight_decay('weights', shape=[insize, outsize], wd=0.004)
        biases = _variable_on_cpu('biases', [outsize], tf.constant_initializer(0.0))
        fc = tf.matmul(in_, weights) + biases

        _activation_summary(fc)

    print name, fc.get_shape().as_list()
    return fc

And I have replaced fc8 = _fc("fc8", fc7, n_classes) by fc8 = _fc_norelu('fc8', fc7, n_classes)

rlczddl commented 7 years ago

I just modify the fc8 in inference_multiview function of model.py,it should work likely yours.I will try yours.