jmtatsch commented 7 years ago

Although the converted weights produce plausible predictions, they are not yet up to the published results of the PSPNet paper.

Current results on cityscapes validation set:

classes          IoU      nIoU
--------------------------------
road          : 0.969      nan
sidewalk      : 0.776      nan
building      : 0.871      nan
wall          : 0.532      nan
fence         : 0.464      nan
pole          : 0.302      nan
traffic light : 0.375      nan
traffic sign  : 0.567      nan
vegetation    : 0.872      nan
terrain       : 0.591      nan
sky           : 0.905      nan
person        : 0.585    0.352
rider         : 0.253    0.147
car           : 0.897    0.698
truck         : 0.606    0.284
bus           : 0.721    0.375
train         : 0.652    0.388
motorcycle    : 0.344    0.147
bicycle       : 0.618    0.348
--------------------------------
Score Average : 0.626    0.342
--------------------------------

categories       IoU      nIoU
--------------------------------
flat          : 0.974      nan
nature        : 0.876      nan
object        : 0.397      nan
sky           : 0.905      nan
construction  : 0.872      nan
human         : 0.603    0.376
vehicle       : 0.879    0.676
--------------------------------
Score Average : 0.787    0.526
--------------------------------

Accuracy of the published code on several validation/testing sets according to the author:

PSPNet50 on ADE20K valset (mIoU/pAcc): 41.68/80.04 
PSPNet101 on VOC2012 testset (mIoU): 85.41 (multiscale evaluation!)
PSPNet101 on cityscapes valset (mIoU/pAcc): 79.70/96.38

So we are still missing 79.70 - 62.60 = 17.10 % IoU

Does anyone have an idea where we lose that accuracy?

jmtatsch commented 7 years ago

Is scipy's misc.imresize(img, self.input_shape) really exactly the same as matlabs imresize(img,[new_rows new_cols],'bilinear'); How about the tf.image.resize vs the caffe Interp layer?

jefequien commented 7 years ago

You could save the intermediate activations from Keras and compare them with the Caffe. There are a few lines for this in utils.py

I forget but they may have used a sliding window for evaluation as opposed to our rescaling which distorts the aspect ratio.

jmtatsch commented 7 years ago

Looking into this further, there are minor differences between the python and Matlab image resizing depending on input data type. See e.g. https://stackoverflow.com/questions/26812289/matlab-vs-c-vs-opencv-imresize Matlab performs anti-aliasing by default and also on Zhao's original code.

I did some experiments and even for a toy example nothing equals the matlab default matlab_imresize_AA

original:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
scipy_imresize:  [[ 4  5]
 [10 11]]
scipy_imresize_float:  [[  3.57142878   5.14285707]
 [  9.8571434   11.4285717 ]]
scipy_zoom: [[ 0  3]
 [12 15]]
OpenCv:  [[ 3  5]
 [11 13]]
scikit_resize:  [[ 3  5]
 [11 13]]
matlab_imresize_AA:  [[4,5],[11,12]]
matlab_imresize:  [[3,5],[11,13]]

@hujh14 you mentioned somewhere that you already checked the activations. Do yo remember up to which point?

jmtatsch commented 7 years ago

@hujh14 Unfortunately, i cannot run the original code on a 8GB 1080 GTX, not even with batch size 1 due to insufficient memory. Did you manage to compile the original code with CuDNN support? or do you have 12GB cards?

Vladkryvoruchko commented 7 years ago

@jmtatsch I have card same as yours, and its works well, takes about 3.5 gb of memory

jmtatsch commented 7 years ago

@Vladkryvoruchko are you sure it works with the cityscapes model? it is much larger than both others... which CUDA, CuDNN?

jmtatsch commented 7 years ago

Doing a sliced evaluation now, much better detail!

cityscapes_seg_blended

Will do a smarter overlap treatment then evaluate again

Vladkryvoruchko commented 7 years ago

@jmtatsch . Oh... I wrote about ADE :)

jmtatsch commented 7 years ago

13

jmtatsch commented 7 years ago

Without flipped evaluation

classes          IoU      nIoU
--------------------------------
road          : 0.981      nan
sidewalk      : 0.849      nan
building      : 0.922      nan
wall          : 0.572      nan
fence         : 0.624      nan
pole          : 0.589      nan
traffic light : 0.690      nan
traffic sign  : 0.777      nan
vegetation    : 0.919      nan
terrain       : 0.641      nan
sky           : 0.943      nan
person        : 0.808    0.627
rider         : 0.617    0.472
car           : 0.949    0.857
truck         : 0.766    0.477
bus           : 0.859    0.639
train         : 0.794    0.569
motorcycle    : 0.653    0.456
bicycle       : 0.768    0.578
--------------------------------
Score Average : 0.775    0.584
--------------------------------

categories       IoU      nIoU
--------------------------------
flat          : 0.986      nan
nature        : 0.921      nan
object        : 0.670      nan
sky           : 0.943      nan
construction  : 0.925      nan
human         : 0.822    0.656
vehicle       : 0.936    0.834
--------------------------------
Score Average : 0.886    0.745
--------------------------------

Just 2.2 % missing :)

jmtatsch commented 7 years ago

Okay, with flipped evaluation:

classes          IoU      nIoU
--------------------------------
road          : 0.982      nan
sidewalk      : 0.853      nan
building      : 0.924      nan
wall          : 0.582      nan
fence         : 0.634      nan
pole          : 0.593      nan
traffic light : 0.692      nan
traffic sign  : 0.780      nan
vegetation    : 0.920      nan
terrain       : 0.644      nan
sky           : 0.945      nan
person        : 0.811    0.630
rider         : 0.618    0.478
car           : 0.951    0.860
truck         : 0.795    0.480
bus           : 0.870    0.638
train         : 0.799    0.591
motorcycle    : 0.656    0.457
bicycle       : 0.771    0.580
--------------------------------
Score Average : 0.780    0.589
--------------------------------

categories       IoU      nIoU
--------------------------------
flat          : 0.986      nan
nature        : 0.922      nan
object        : 0.674      nan
sky           : 0.945      nan
construction  : 0.927      nan
human         : 0.825    0.659
vehicle       : 0.937    0.836
--------------------------------
Score Average : 0.888    0.747
--------------------------------

Still 1,7% missing. Will look into multi-scale evaluation.

jmtatsch commented 7 years ago

Adding multi-scale on top actually worsened the results by far. A funky gif with scales = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] last frame is the aggregated one. evaluation_of_scales_2017-09-05 10 12 21 396198

leinxx commented 7 years ago

@ jmtatsch, would you mind explain what do you mean by sliced prediction? I am working on the same problem:(

jmtatsch commented 7 years ago

@leinxx by sliced prediction I mean cutting the image into 4x2 overlapping 713x713 slices, forwarding them though the network and reassembling the 2048x1024 predictions from them.

sliding_evaluation

Please let us know if fix further issues and get closer to the published results...

wtliao commented 7 years ago

@jmtatsch thanks a lot. I have learned much from your replication. Would you explain why sliced prediction improve the result so much? and what is flipped evaluation (do you mean flip over the image during training for data augumentation)? Could you provide more details about your training setting? e.g. batch_size, epoch, etc.

jmtatsch commented 7 years ago

@wtliao Unfortunately, I did not (yet) train these weights myself. Sliced/sliding prediction has so much more details because the weights were trained on 714x714 crops from the full resolution image and the 714x714 crops are used for prediction instead of a downsampled 512x256 image that is then upsampled to full resolution. Flipped evaluation means predicting on the image and a vertically flipped image at the same time and averaging the results.

jkschin commented 7 years ago

@jmtatsch Thanks for these posts! They were really helpful! :+1:

scenarios commented 6 years ago

Hi @jmtatsch, I found in your code the kernel size and stride size of the pyramid pooling module are set by (10xlevel, 10xlevel). It is the right size for VOC and ade20k with input size (473, 473). However, when using input size (713, 713) as in cityspace case, the size obtained by (10xlevel, 10xlevel) is not identical to the original code. I believe it is the main reason for the last 1.7% performance drop ^^

jmtatsch commented 6 years ago

@scenarios Good catch, I will fix this in #30 and reevaluate.

TheRevanchist commented 6 years ago

Hi,

What are the results do you get after the last changes? Do you get similar results to the paper, or still they are worse?

shipengai commented 6 years ago

@jmtatsch.Did you train PSPnet on any dataset?

jmtatsch commented 6 years ago

No, But some train code was merged recently, so someone should have.

Am 24.02.2018 um 10:54 schrieb shipeng notifications@github.com:

@jmtatsch.Did you train PSPnet on any dataset?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

YuShen1116 commented 6 years ago

Hi.

Thanks for your excellent work, could you please tell me how do you get the evaluation results? Are you using the scripts in the cityscapes repo?

THank you!

ktnguyen2 commented 5 years ago

Hello,

I would also like to know how to get the evaluation results. I tried using the cityscapes scripts, but I am getting this error when using seg_read images as my input:

Traceback (most recent call last): File "evalPixelLevelSemanticLabeling.py", line 696, in main() File "evalPixelLevelSemanticLabeling.py", line 690, in main evaluateImgLists(predictionImgList, groundTruthImgList, args) File "evalPixelLevelSemanticLabeling.py", line 478, in evaluateImgLists nbPixels += evaluatePair(predictionImgFileName, groundTruthImgFileName, confMatrix, instStats, perImageStats, args) File "evalPixelLevelSemanticLabeling.py", line 605, in evaluatePair confMatrix[gt_id][pred_id] += c IndexError: index 34 is out of bounds for axis 0 with size 34

Vladkryvoruchko / PSPNet-Keras-tensorflow

Bad results(Not bad now) #12

13