GeorgeSeif / Semantic-Segmentation-Suite

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
2.51k stars 879 forks source link

Reproducibility of CamVid results #11

Closed jeffreylutz closed 6 years ago

jeffreylutz commented 6 years ago

George,

I'm having problems reproducing the results for training CamVid. I am trying the follow with no luck. I attempt to predict after training and confirmed the prediction is incorrect also.

TRAINING RESULTS: Validation precision = 0.49989 Validation recall = 0.512134 Validation F1 = 0.50587 Validation IoU = 0.01776

TRAIN:
python main.py --mode train --dataset CamVid --model PSPNet-Res50 --batch_size 100000 --num_epoch 300

PREDICT: python main.py --mode predict --dataset CamVid --model PSPNet-Res50 --image trash/in.png

GeorgeSeif commented 6 years ago

Hi Jeffrey,

Why is your batch size so huge?

jeffreylutz commented 6 years ago

George,

I realized that you had specified some detail for the results you had. So, I tried following it. Can you provide me the exact commandline switches you used to perform a clean training? Here is what I could glean from the README in github.

python main.py --mode train --dataset CamVid --model FC-DenseNet103 --batch_size 1 --num_epochs 300

The precision, recall and IoU scores are really bad when I run with this. I must be doing something wrong. Here are the details of the scores:

Validation precision = 0.5023829499904845 Validation recall = 0.5001694883134198 Validation F1 score = 0.5012717223594603 Validation IoU score = 0.15455210261206462

Jeff

On Tue, Feb 20, 2018 at 5:04 PM, George Seif notifications@github.com wrote:

Hi Jeffrey,

Why is your batch size so huge?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GeorgeSeif/Semantic-Segmentation-Suite/issues/11#issuecomment-367136351, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUNKdJ9VNgnZ-xOZT4lXnXt7-dHj-4Lks5tW0FlgaJpZM4SMVqv .

--

Regards, Jeff

GeorgeSeif commented 6 years ago

Yes those were the exact settings.

Hmmm did you train for the full 300 epochs? And how does the accuracy look? If accuracy is good then it could be the precision, recall, and IoU calculations that are just wrong. How do the images look?

One big thing to note is that the accuracy I have in the README Results sections was done with an older research version of CamVid that had 12 classes. I haven't retrained fully on this one yet. I will once I get a chance. There could be something wrong when I made the transition for the new dataset, though it seemed to be training just fine when I did it for a few epochs.

jeffreylutz commented 6 years ago

George,

So, the CamVid images currently in the git repo have a dimension of 960 x 720 and 32 total classes. Apparently, the original dataset you might have used had a different dimension for the images. Also, the image classes went from 11 to 32 classes with the current CamVid.

In an effort to fix this, I would like to reproduce your results. Can you provide me a link to the original CamVid dataset? I had thought to look in the history of Git but there is no previous version with different CamVid dataset.

Location of original CamVid dataset?

Jeff

On Tue, Feb 20, 2018 at 7:04 PM, George Seif notifications@github.com wrote:

Yes those were the exact settings.

Hmmm did you train for the full 300 epochs?

One big thing to note is that the accuracy I have in the README Results sections was done with an older research version of CamVid that had 12 classes. I haven't retrained fully on this one yet. I will once I get a chance. There could be something wrong when I made the transition for the new dataset, though it seemed to be training just fine when I did it for a few epochs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GeorgeSeif/Semantic-Segmentation-Suite/issues/11#issuecomment-367165487, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUNKaEcPa9pWbKQrT091eVqvkpYe-caks5tW12JgaJpZM4SMVqv .

--

Regards, Jeff

GeorgeSeif commented 6 years ago

Great observation. The original CamVid had dimensions of 360x480 (which I cropped to 352x480 because of downsampling in the networks) and only had 12 classes. So you're correct.

It should be in the git history because I had previously pushed it up here. But here's a link anyways: https://github.com/alexgkendall/SegNet-Tutorial/tree/master/CamVid

I'll look into this and train it once I have a chance, plus upload a pretrained model. I'm just using my GPU for other things now.

What was the validation accuracy like? And can you upload some of the images here so I can take a look? How many epochs did you train for?

Thanks

jeffreylutz commented 6 years ago

So, I was able to get back to a previous commit version where training works again. The commit id: 27b704e works. I'm going to compare it with the HEAD to see what else besides image dimensions account for the HEAD not working. I even resized the images in HEAD and it failed to train (not able to even start).

Once I have the HEAD healthy again, I'm going to setup a CI loop for the most basic training approach.

I'll keep you posted. Jeff

On Wed, Feb 21, 2018 at 10:16 AM, George Seif notifications@github.com wrote:

Great observation. The original CamVid had dimensions of 360x480 (which I cropped to 352x480 because of downsampling in the networks) and only had 12 classes. So you're correct.

It should be in the git history because I had previously pushed it up here. But here's a link anyways: https://github.com/alexgkendall/SegNet-Tutorial/tree/master/CamVid

I'll look into this and train it once I have a chance, plus upload a pretrained model. I'm just using my GPU for other things now.

What was the validation accuracy like? And can you upload some of the images here so I can take a look? How many epochs did you train for?

Thanks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GeorgeSeif/Semantic-Segmentation-Suite/issues/11#issuecomment-367359263, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUNKVrFqm2a53yv_9fKGDjVqmiLxPsqks5tXDNLgaJpZM4SMVqv .

--

Regards, Jeff

Spritea commented 6 years ago

Hi, George, I'm following this issue and try to reproduce the results for training CamVid. I'm using the version of commit id: 27b704e just like Jeff. I also use the same order as follows.
python main.py --mode train --dataset CamVid --model FC-DenseNet103 --batch_size 1 --num_epochs 300 However, every time my training goes to epoch=51, the program crashed like this screenshot from 2018-03-13 09-24-54 Any idea will be welcome, thanks!

GeorgeSeif commented 6 years ago

Hi @Spritea

Hhmmm interesting. I've never even seen such an error. How many times has it happened?

Spritea commented 6 years ago

George, Uhh, twice. I'll try more to see what's the problem, and I guess maybe it's related to environment. Besides, I switch to the HEAD version on my own data set, and it works correctly. Your code is really clear! Thanks anyway~

GeorgeSeif commented 6 years ago

Could be. I'm actually running a few tests right now to add some new features. I'll see if something like that comes up, though it never did before!

GeorgeSeif commented 6 years ago

Hi there,

This issue has been resolved in the latest commit. There were two changes that really fixed things:

-- First of all, we should not do mean image subtraction before the pretrained ResNet. I removed that line for all networks that did that and it substantially improved the final prediction results

-- I fixed up the computations of precision, recall, and F1 score to use Scikit Learn's implementation.

Now one major thing to note: In Scikit Learn, one can select the different ways of computing precision, recall, and F1 score. They are:

micro --> Calculate metrics globally by counting the total true positives, false negatives and false positives.

macro --> Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

weighted --> Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

You can select with message you want as a function argument now.

Now for the mean IoU there is something related to the above: Most papers actually use the weighted mean IoU, along with the weighted precision, recall, and F1 score. For example, check out the mean IoU on MIT's Scene Parsing Benchmark Repo.

https://github.com/CSAILVision/sceneparsing

The top unweighted mean IoU is 0.3490 where as the weighted mean IoU is 0.6108 which is very similar to the papers.

This repository currently computes the unweighted mean IoU. I have just tested all of the networks and found that they do produce good results similar to there paper. Perhaps I will add options on how the mean IoU is computed like Scikit Learn does for the scores above.

Closing this issue as it has been resolved.

Cheers!

nooriahmed commented 5 years ago

How would be increase Average mean IoU. Any productive receipe please? Running test image 168 / 168Average test accuracy = 0.7722670918419248 Average per class test accuracies =

Animal = 0.982143 Archway = 0.910714 Bicyclist = 0.795735 Bridge = 0.982143 Building = 0.877125 Car = 0.779497 CartLuggagePram = 0.702381 Child = 0.937878 Column_Pole = 0.340315 Fence = 0.819580 LaneMkgsDriv = 0.414625 LaneMkgsNonDriv = 0.970238 Misc_Text = 0.516968 MotorcycleScooter = 0.976190 OtherMoving = 0.829898 ParkingBlock = 0.885432 Pedestrian = 0.590493 Road = 0.912155 RoadShoulder = 0.949243 Sidewalk = 0.792470 SignSymbol = 0.601385 Sky = 0.931321 SUVPickupTruck = 0.751057 TrafficCone = 0.994048 TrafficLight = 0.692748 Train = 1.000000 Tree = 0.809915 Truck_Bus = 0.884559 Tunnel = 1.000000 VegetationMisc = 0.823739 Void = 0.332217 Wall = 0.691263 Average precision = 0.8014890549886131 Average recall = 0.7722670918419248 Average F1 score = 0.7564816960037837 Average mean IoU score = 0.39641312733923195 Average run time = 0.08028133000646319

`