Closed lucastabelini closed 4 years ago
@lucastabelini I think you can adjust the learning rate to tune the performance. Also, you can try single GPU training. My experience is that single GPU training is much slower, but you can get better performance. I don't know the reason, but I think it is related to the BN layer.
I trained the model using a single GPU. Are the parameters in configs/culane.py
not the same ones used in the paper?
@lucastabelini
Yes, it is different. The published code is a re-implementation of the original project. The biggest difference is that the original code uses DataParallel
and the published version changes to DistributedDataParallel
. So the settings are slightly different.
It should be noted that if you use single GPU training, the batch size
should be enlarged about 4x larger according to the linear scaling rule since the default learning rate is tuned for 4-GPU training.
Reading the "Implementation details" paragraph in the paper I found the following settings to be different:
optimizer -> Adam instead of SGD
lr_scheduler -> cosine decay instead of "multi"
learning_rate -> 4e-4 instead of 1e-3
Are there any more settings that I should set differently to the default ones? Could you provide the configs to obtain the reported values in the paper for CULane and TuSimple for a single GPU? I'm going to use the values I'm able to reach using this code on a paper I am working on.
Since I can't fit a batch of 32x4 images on a single GPU, I scaled the learning rate (0.1 to 0.025) and am now training a model with this setup.
@lucastabelini Yes, scaling the learning rate to 1/4 should also work.
The original DataParallel
version uses Adam for all settings as the paper reported. You can try with the Adam optimizer and cosine decay scheduler on CULane, but I remembered that in the reimplemented DistributedDataParallel
version, i.e. the published version, SGD would produce better results. Some users even produced better results than us #11.
Scaling the learning rate did not help much. The F1 improved from 70.5% to 71%. I also trained a model on TuSimple using ResNet-34 as the backbone to get the metrics I could not find in the paper (FP and FN) and the accuracy is pretty close to the ones reported in the paper.
@lucastabelini would you show the config file for culane data, like culane.py? I can not get the correct result when training.
@longzeyilang I changed almost nothing in the default config file. Only paths and the backbone (because I wanted to train with ResNet-34 instead of 18).
@lucastabelini have you changed image resolution for training ? I croped image of culane dataset (1280*590) to train, the model did not detect lane at all. and the shp and shi loss weight set 0, why?
@longzeyilang No, I did not change the resolution.
me too get the same issue. seems like Focal Loss is oscillating around 2.5 ? what could be the reason for that? I set loss coefficients to 1 as mentioned in your paper.
I trained a model on CULane, using ResNet-34 as the backbone, and the model's F1 on the test set was only 70.5. The only modification I did was changing the backbone line in the config file to
backbone = '34'
. Are there any more changes that are necessary to achieve the reported 72.3% F1 in the paper?