Lower F1 when training on CULane

cfzd / Ultra-Fast-Lane-Detection

Ultra Fast Structure-aware Deep Lane Detection (ECCV 2020)

MIT License

1.82k stars 493 forks source link

Lower F1 when training on CULane #78

Closed lucastabelini closed 4 years ago

lucastabelini commented 4 years ago

I trained a model on CULane, using ResNet-34 as the backbone, and the model's F1 on the test set was only 70.5. The only modification I did was changing the backbone line in the config file to backbone = '34'. Are there any more changes that are necessary to achieve the reported 72.3% F1 in the paper?

cfzd commented 4 years ago

@lucastabelini I think you can adjust the learning rate to tune the performance. Also, you can try single GPU training. My experience is that single GPU training is much slower, but you can get better performance. I don't know the reason, but I think it is related to the BN layer.

lucastabelini commented 4 years ago

I trained the model using a single GPU. Are the parameters in configs/culane.py not the same ones used in the paper?

cfzd commented 4 years ago

@lucastabelini Yes, it is different. The published code is a re-implementation of the original project. The biggest difference is that the original code uses DataParallel and the published version changes to DistributedDataParallel. So the settings are slightly different.

It should be noted that if you use single GPU training, the batch size should be enlarged about 4x larger according to the linear scaling rule since the default learning rate is tuned for 4-GPU training.

lucastabelini commented 4 years ago

Reading the "Implementation details" paragraph in the paper I found the following settings to be different:

optimizer -> Adam instead of SGD
lr_scheduler -> cosine decay instead of "multi"
learning_rate -> 4e-4 instead of 1e-3

Are there any more settings that I should set differently to the default ones? Could you provide the configs to obtain the reported values in the paper for CULane and TuSimple for a single GPU? I'm going to use the values I'm able to reach using this code on a paper I am working on.

Since I can't fit a batch of 32x4 images on a single GPU, I scaled the learning rate (0.1 to 0.025) and am now training a model with this setup.

cfzd commented 4 years ago

@lucastabelini Yes, scaling the learning rate to 1/4 should also work.

The original DataParallel version uses Adam for all settings as the paper reported. You can try with the Adam optimizer and cosine decay scheduler on CULane, but I remembered that in the reimplemented DistributedDataParallel version, i.e. the published version, SGD would produce better results. Some users even produced better results than us #11.

lucastabelini commented 4 years ago

Scaling the learning rate did not help much. The F1 improved from 70.5% to 71%. I also trained a model on TuSimple using ResNet-34 as the backbone to get the metrics I could not find in the paper (FP and FN) and the accuracy is pretty close to the ones reported in the paper.

longzeyilang commented 4 years ago

@lucastabelini would you show the config file for culane data, like culane.py? I can not get the correct result when training.

lucastabelini commented 4 years ago

@longzeyilang I changed almost nothing in the default config file. Only paths and the backbone (because I wanted to train with ResNet-34 instead of 18).

longzeyilang commented 4 years ago

@lucastabelini have you changed image resolution for training ？ I croped image of culane dataset (1280*590) to train, the model did not detect lane at all. and the shp and shi loss weight set 0, why?

lucastabelini commented 4 years ago

@longzeyilang No, I did not change the resolution.

damithkawshan commented 4 years ago

me too get the same issue. seems like Focal Loss is oscillating around 2.5 ? what could be the reason for that? I set loss coefficients to 1 as mentioned in your paper.