Open jmtatsch opened 7 years ago
Is scipy's misc.imresize(img, self.input_shape) really exactly the same as matlabs imresize(img,[new_rows new_cols],'bilinear'); How about the tf.image.resize vs the caffe Interp layer?
You could save the intermediate activations from Keras and compare them with the Caffe. There are a few lines for this in utils.py
I forget but they may have used a sliding window for evaluation as opposed to our rescaling which distorts the aspect ratio.
Looking into this further, there are minor differences between the python and Matlab image resizing depending on input data type. See e.g. https://stackoverflow.com/questions/26812289/matlab-vs-c-vs-opencv-imresize Matlab performs anti-aliasing by default and also on Zhao's original code.
I did some experiments and even for a toy example nothing equals the matlab default matlab_imresize_AA
original: [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
scipy_imresize: [[ 4 5]
[10 11]]
scipy_imresize_float: [[ 3.57142878 5.14285707]
[ 9.8571434 11.4285717 ]]
scipy_zoom: [[ 0 3]
[12 15]]
OpenCv: [[ 3 5]
[11 13]]
scikit_resize: [[ 3 5]
[11 13]]
matlab_imresize_AA: [[4,5],[11,12]]
matlab_imresize: [[3,5],[11,13]]
@hujh14 you mentioned somewhere that you already checked the activations. Do yo remember up to which point?
@hujh14 Unfortunately, i cannot run the original code on a 8GB 1080 GTX, not even with batch size 1 due to insufficient memory. Did you manage to compile the original code with CuDNN support? or do you have 12GB cards?
@jmtatsch I have card same as yours, and its works well, takes about 3.5 gb of memory
@Vladkryvoruchko are you sure it works with the cityscapes model? it is much larger than both others... which CUDA, CuDNN?
Doing a sliced evaluation now, much better detail!
Will do a smarter overlap treatment then evaluate again
@jmtatsch . Oh... I wrote about ADE :)
Without flipped evaluation
classes IoU nIoU
--------------------------------
road : 0.981 nan
sidewalk : 0.849 nan
building : 0.922 nan
wall : 0.572 nan
fence : 0.624 nan
pole : 0.589 nan
traffic light : 0.690 nan
traffic sign : 0.777 nan
vegetation : 0.919 nan
terrain : 0.641 nan
sky : 0.943 nan
person : 0.808 0.627
rider : 0.617 0.472
car : 0.949 0.857
truck : 0.766 0.477
bus : 0.859 0.639
train : 0.794 0.569
motorcycle : 0.653 0.456
bicycle : 0.768 0.578
--------------------------------
Score Average : 0.775 0.584
--------------------------------
categories IoU nIoU
--------------------------------
flat : 0.986 nan
nature : 0.921 nan
object : 0.670 nan
sky : 0.943 nan
construction : 0.925 nan
human : 0.822 0.656
vehicle : 0.936 0.834
--------------------------------
Score Average : 0.886 0.745
--------------------------------
Just 2.2 % missing :)
Okay, with flipped evaluation:
classes IoU nIoU
--------------------------------
road : 0.982 nan
sidewalk : 0.853 nan
building : 0.924 nan
wall : 0.582 nan
fence : 0.634 nan
pole : 0.593 nan
traffic light : 0.692 nan
traffic sign : 0.780 nan
vegetation : 0.920 nan
terrain : 0.644 nan
sky : 0.945 nan
person : 0.811 0.630
rider : 0.618 0.478
car : 0.951 0.860
truck : 0.795 0.480
bus : 0.870 0.638
train : 0.799 0.591
motorcycle : 0.656 0.457
bicycle : 0.771 0.580
--------------------------------
Score Average : 0.780 0.589
--------------------------------
categories IoU nIoU
--------------------------------
flat : 0.986 nan
nature : 0.922 nan
object : 0.674 nan
sky : 0.945 nan
construction : 0.927 nan
human : 0.825 0.659
vehicle : 0.937 0.836
--------------------------------
Score Average : 0.888 0.747
--------------------------------
Still 1,7% missing. Will look into multi-scale evaluation.
Adding multi-scale on top actually worsened the results by far. A funky gif with scales = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] last frame is the aggregated one.
@ jmtatsch, would you mind explain what do you mean by sliced prediction? I am working on the same problem:(
@leinxx by sliced prediction I mean cutting the image into 4x2 overlapping 713x713 slices, forwarding them though the network and reassembling the 2048x1024 predictions from them.
Please let us know if fix further issues and get closer to the published results...
@jmtatsch thanks a lot. I have learned much from your replication. Would you explain why sliced prediction improve the result so much? and what is flipped evaluation (do you mean flip over the image during training for data augumentation)? Could you provide more details about your training setting? e.g. batch_size, epoch, etc.
@wtliao Unfortunately, I did not (yet) train these weights myself. Sliced/sliding prediction has so much more details because the weights were trained on 714x714 crops from the full resolution image and the 714x714 crops are used for prediction instead of a downsampled 512x256 image that is then upsampled to full resolution. Flipped evaluation means predicting on the image and a vertically flipped image at the same time and averaging the results.
@jmtatsch Thanks for these posts! They were really helpful! :+1:
Hi @jmtatsch, I found in your code the kernel size and stride size of the pyramid pooling module are set by (10xlevel, 10xlevel). It is the right size for VOC and ade20k with input size (473, 473). However, when using input size (713, 713) as in cityspace case, the size obtained by (10xlevel, 10xlevel) is not identical to the original code. I believe it is the main reason for the last 1.7% performance drop ^^
@scenarios Good catch, I will fix this in #30 and reevaluate.
Hi,
What are the results do you get after the last changes? Do you get similar results to the paper, or still they are worse?
@jmtatsch.Did you train PSPnet on any dataset?
No, But some train code was merged recently, so someone should have.
Am 24.02.2018 um 10:54 schrieb shipeng notifications@github.com:
@jmtatsch.Did you train PSPnet on any dataset?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi.
Thanks for your excellent work, could you please tell me how do you get the evaluation results? Are you using the scripts in the cityscapes repo?
THank you!
Hello,
I would also like to know how to get the evaluation results. I tried using the cityscapes scripts, but I am getting this error when using seg_read images as my input:
Traceback (most recent call last): File "evalPixelLevelSemanticLabeling.py", line 696, in main() File "evalPixelLevelSemanticLabeling.py", line 690, in main evaluateImgLists(predictionImgList, groundTruthImgList, args) File "evalPixelLevelSemanticLabeling.py", line 478, in evaluateImgLists nbPixels += evaluatePair(predictionImgFileName, groundTruthImgFileName, confMatrix, instStats, perImageStats, args) File "evalPixelLevelSemanticLabeling.py", line 605, in evaluatePair confMatrix[gt_id][pred_id] += c IndexError: index 34 is out of bounds for axis 0 with size 34
Although the converted weights produce plausible predictions, they are not yet up to the published results of the PSPNet paper.
Current results on cityscapes validation set:
Accuracy of the published code on several validation/testing sets according to the author:
So we are still missing 79.70 - 62.60 = 17.10 % IoU
Does anyone have an idea where we lose that accuracy?