Vladkryvoruchko / PSPNet-Keras-tensorflow

TensorFlow implementation of original paper : https://github.com/hszhao/PSPNet
MIT License
394 stars 174 forks source link

About your result output #2

Closed wcy940418 closed 6 years ago

wcy940418 commented 7 years ago

Hello, thank you for your excellent work. I am also interested in implementing PSPNet in TF. I have not tested your code yet, but about your output, I found some post process code in original matlab code.

function score = ms_caffe_process(input_data,net)
    score = net.forward({input_data});
    score = score{1};
    score_flip = net.forward({input_data(end:-1:1,:,:)});
    score_flip = score_flip{1};
    score = score + score_flip(end:-1:1,:,:);

    score = permute(score, [2 1 3]);
    score = exp(score);
    score = bsxfun(@rdivide, score, sum(score, 3));
end

It seems that they feed in the image twice each time, and one is flipped. Do you know why they did this?

Thank you

Vladkryvoruchko commented 7 years ago

Hi @wcy940418 . I havent found any implementation details about this processing in the original paper. As I dont have any experience in matlab, from code it seems that image is not actually flipped, but changed order of RGB channels to possible BGR... and then in score addition its changed back. In my pycaffe code i didnt do that "flipping" and it actually works pretty fine.

Will return to working with repo soon.

wcy940418 commented 7 years ago

Thank you @Vladkryvoruchko , I see. I will try to figure out the missing detail in the original network.

Vladkryvoruchko commented 7 years ago

@wcy940418 Thank you. Keep me up to date if you proceed :)

wcy940418 commented 7 years ago

@Vladkryvoruchko How did you modify the caffe-tensorflow code to port out bn layer parameters? And I tested using pycaffe to predict image, it just works fine as you said.

wcy940418 commented 7 years ago

Also, I changed some lines in your code, now I get some better results, but not perfect. Could you tell me the order you convert scale, offset, mean, variance from caffemodel?

Vladkryvoruchko commented 7 years ago

@wcy940418 i will now upload in this repo original caffe-tensorflow code, and then commit modifications. EDIT: I'll try to have a look at the converter code again later, and uploaded converter in this repo

wcy940418 commented 7 years ago

Hi @Vladkryvoruchko , I found a snippet in your caffe-tensorflow code to port out bn params. In kaffe/transformers.py line 258:

# Prescale the stats
  | scaling_factor = 1.0 / scale if scale != 0 else 0
  | mean *= scaling_factor
  | variance *= scaling_factor

I am not sure mean and variance need to be prescaled. I think the meaning of four params of bn layer are identical in both caffe and tensorflow: scale * (data - mean) / variance + offset

Vladkryvoruchko commented 7 years ago

@wcy940418 Yea, but this snippet is from original converter and should work just fine. If we look on the tutorial of converting DilatedNet(link) there is also supposed to be BN converting the same way and it works fine there

wcy940418 commented 7 years ago

Okay, I see. Look at here, in keras batchnormolization core function definition:

        if self.scale:
            self.gamma = self.add_weight(shape=shape,
                                         name='gamma',
                                         initializer=self.gamma_initializer,
                                         regularizer=self.gamma_regularizer,
                                         constraint=self.gamma_constraint)
        else:
            self.gamma = None
        if self.center:
            self.beta = self.add_weight(shape=shape,
                                        name='beta',
                                        initializer=self.beta_initializer,
                                        regularizer=self.beta_regularizer,
                                        constraint=self.beta_constraint)
        else:
            self.beta = None
        self.moving_mean = self.add_weight(
            shape=shape,
            name='moving_mean',
            initializer=self.moving_mean_initializer,
            trainable=False)
        self.moving_variance = self.add_weight(
            shape=shape,
            name='moving_variance',
            initializer=self.moving_variance_initializer,
            trainable=False)
        self.built = True

The order of parameters in bn layer is scale, offset, mean, variance. But in your code:

model.get_layer(layer.name).set_weights([mean, variance, scale, offset])

The assigning order is mean, variance, scale, offset. But even I change the assigning order, the result is not meaningful. So I still consider the ported parameters either have the wrong name or have unnecessary pre-process as I said before. Do you have any idea of this?

wcy940418 commented 7 years ago

And I found another wrong code. Here in layers_builder.py line 177:

    for i in range(3): #for i in range(2): old wrong code 

level 3 should have 3_1 - 3_4 according to the original network.

wcy940418 commented 7 years ago

I got even better result by finding out that there is a

prev_layer = Activation('relu')(prev_layer)

lost at the top of

def residual_short(prev_layer, level, pad=1, lvl=1, sub_lvl=1, modify_stride=False):

And also the epsilon setting of bn_layer has to be 1e-5, which is different from the default setting of Keras 1e-3. Anyway, the result is much better than before, but still has many miss predictions now. I think is should be the difference between keras bn can caffe bn layer. You can have a look at the example in my fork.

Vladkryvoruchko commented 7 years ago

@wcy940418 wow. I will now test it. impressive EDIT: Oh, i remembered one thing. Earlier I was thinking about some ideas in pycaffe version of code.

So the first one: 1) Implement channel swap and mean color extraction.

SCALE_SIZE = 473
MEAN_R = 123.68
MEAN_G = 116.779
MEAN_B = 103.939
CHANNEL_SWAP = [2,1,0]
PROCESSOR = caffe.Classifier(MODEL_PATH, PRETRAINED_FILE_PATH, 
                image_dims=[SCALE_SIZE,SCALE_SIZE], 
                raw_scale=255.0, 
                channel_swap=CHANNEL_SWAP, 
                mean=np.array([MEAN_R, MEAN_G, MEAN_B], 
                dtype='f4'))

2)And the other one which i actually didnt understand, is:

def forward(self, img):
    input_img = [caffe.io.load_image(img)]
    #Net processing
    predictions = self.PROCESSOR.predict(input_img, not 'store_true')
    return predictions[0]

caffe/blob/master/python/caffe/classifier.py : line 47

def predict(self, inputs, oversample=True):
    """
    Predict classification probabilities of inputs.
    Parameters
    ----------
    inputs : iterable of (H x W x K) input ndarrays.
    oversample : boolean
        average predictions across center, corners, and mirrors
        when True (default). Center-only prediction when False.

Also line

input_img = [caffe.io.load_image(img)]

I made some experiments earlier and my predictions were different, if i substitute _io.loadimage to just open with PIL and convert to nd.array with dtype='float32', and thus was worse, similarly to output you are having right now

wcy940418 commented 6 years ago

@Vladkryvoruchko But I used Pillow(PIL) to read the image for my pycaffe code, the result is just fine as the original code.

Vladkryvoruchko commented 6 years ago

@wcy940418 Oh, I've missed such a small details, fixed by your modifications. Can you please make a pull request to this repo :)