Experimenting with efficientNet: small Val loss but trash prediction

normandra commented 4 years ago

Hi,

I've been recently experimenting with using different feature extractor, in this case MobilenetV3 and efficientNet-B2. Results with MobilenetV3 was acceptable but in the case of efficientNet I came across a weird phenomenon.

First, I changed the following things:

The Dense head has now been replaced by 4 branches each containing 3 sets of FC at 256, 256 and 128. When using the segmentation branch I adopted a NAS-net like structure essentially pooling from the last layer and a previous block where the stride is at 16. Also in efficientNet instead of the convolution with filter 8, I used convolution with filter 32.

In both case I am not using structural losses.

in the case of Mobilenet the experiment worked. I was able to get a functional model. However with efficienNet although the loss was a lot smaller the predicitons seems to be just random. Any thoughts on this?

small loss weird prediction

Thanks again for the paper and repository.

cfzd commented 4 years ago

@normandra This is really weird, especially when MobileNet could work but EfficientNet couldn't. Could you please visualize the results on the training set? If the results on the training set are bad, the optimization might be wrong since your loss curve looks good. If the results are good, it might be the problem of overfitting. In our work, we use https://github.com/cfzd/Ultra-Fast-Lane-Detection/blob/054941161b55fa8edc92fa35fd1a515d6cf57636/data/dataloader.py#L23-L27 to augment the data. It should be able to solve the problem of overfitting.

normandra commented 4 years ago

@cfzd

Here is one example out of the training image

train_image

Prediction also seems to be random-ish. I initially also thought it was due to overfitting however the small loss was also on the val dataset. And as you can see I've also implemented the data augmentation techniques mentioned in the paper in both mobilenet and efficientNet training

Here is another kicker, I tried training without the segmentation branch and although it did not perform better than the mobilenetv3, predicition seems to be better than with segmentation branch albeit the loss values are more in-tune with what i was expecting with the results.

no seg loss vis no seg

To me it seems that something is wrong with my loss function. It seems that low values does not necesarily represent good prediction. That or the implementation that I took it from is faulty, I took it from a git repo with minimal testing. But then why did the no seg variant worked better? and why did mobilenetv3 trained okay?

I'll do more testing and update my findings here. Any suggestion appreciated :)

cfzd commented 4 years ago

@normandra In fact, I now really suspect that the loss curve is wrong. The value of the loss curve is nearly zero, which is not likely to happen. As you mentioned, you can get normal results without the segmentation branch. I think you can check the code of these losses, especially where they are created and used.

normandra commented 4 years ago

@cfzd

yes, thanks for your tip. I found out that the loss function is definitely wrong, it was close to zero on the faulty image. I've now reimplemented the losses and will restart my experiments. My guess was efficientNet + seg was good enough to exploit this faulty loss function whereas mobilenet and efficientnet alone weren't, so that's very interesting. I'll update again once the results are done.

normandra commented 4 years ago

@cfzd

As a sanity check I tested my loss against the one implemented in this repo. Is it correct that even on perfect match the SoftmaxFocalLoss implemented here is not exactly zero?

cfzd commented 4 years ago

@normandra It could not be zero. For cross-entropy H and KL divergence KL, we have:

H=KL+C

in which C is a constant.

So a perfect match might not be exactly zero in the loss.

GodsonDsilva commented 4 years ago

@cfzd I am planning to runs this on android and ios devices with reduced input shape like 250 x 450 rather than 288x800 one to reduce the inferences timings, will this affect the accuracy, and is it straight forward to do it

@normandra can you share your code of using different feature extractors like mobilenetv3 and efficient-Net-B2.. if possible?

normandra commented 4 years ago

@GodsonDsilva Unfortunately I am currently unable to publish / share my code for efficientnet / mobilenetv3. Just as a hint though, I have done exactly what you are planning to do. I suggest that you take a look at OpenPilot's driving model as my implementation is basically combining this problem reformulation and their architecture.

serilee27 commented 4 years ago

@normandra I'm trying to train using mobilenetv3 features. But I got only 0.6 F1-score which is worse result than resnet18 for CULane. The parameter and loss were default code. Could you share your F1-Score and any hints for me? Thank you

K80 GPU 4, Epoch 50

normandra commented 4 years ago

@serilee27 are you using mobilenet small or large? because in my experiment mobilenet3 small does perform worse than the resnet18 variant which is also the case in their respective imagenet benchmark. I'll try to get back to you for the f1 score.

serilee27 commented 4 years ago

@normandra Mine was mobilenet v3 small version. You are right. Imagenet accuracy of mobilenetv3-small is lower than resnet18. Probably efficientNet-B0 is slower than mobilenetv3-large. Isn't it? I look forward to your F1 score.

cfzd / Ultra-Fast-Lane-Detection

Experimenting with efficientNet: small Val loss but trash prediction #91