size of the input layer and training models

tyluckyma commented 7 years ago

Hi @ai-tor

How is everything? I'm recently using deep learning to predict steer angles, and I notice your Vpilot. Seems like you implemented most of the Nivida AlexNet. However, I was thinking 66*200 as the input layer is too small. I was wondering how can the computer identify the traffic lights. So may I ask:

Have you tested the RMSE of your Nivida architecture?
I notice comma.ai is kind of using 160x320 in his RNN, and komanda (one of the top3 team in Udacity Challenge#2) is using 480x640. Have you tried to adjust the input layer and try the results?
How about other well-known models such as VGG, ResNet etc.? Seems like most people just update parameters based on the pre-trained models, what do you think of these approaches?

aitorzip commented 7 years ago

Hi @tyluckyma

It's fine, thanks :)

So this is the thing, I used NVIDIA's arch to predict steering angles for another project and it worked perfectly. For now I have never tested it to drive on traffic, including pedestrians, other vehicles and traffic signs. As you say it may be possible that 200*66 is too small for that purpose, but we can take into account also that it doesn't include any pooling layer, so there is no major information loss (except for the convolutional strides). If I notice that the model has high bias when trying to predict throttle and braking I will move to a larger one.

I did, but only for steering, not yet for throttle and brake (first results after Christmas). I have to say, that checking RMSE or MSE for steering angle prediction is not a good measure of performance of the network. I remember that sometimes networks with higher MSE (both in training and validation) had better behavior than networks with lower error. It could possibly be that the model falls in different local minimas.
Yes, in fact, to predict throttle and braking accurately you need an RNN, as you need to infer speed and possibly acceleration of the other vehicles, and yours.
I prefer to keep it simple at the beginning, VGG and ResNet are quite big CNNs and I actually think it will be too much and I want my network to run in real-time, probably an smaller model like AlexNet + RNN should work. For now I haven't tried bigger models due to my laptop specs, but I am building a Deep Learning computer that should be ready by next week, there I wiill be able to test larger models.

So, while I progress with the task you should see updates in the model, first thing I will do is to add LSTM to the current model and check the bias. If I see it is high, I will increase the input layer, which my new brand GTX 1080 should be capable of managing efficiently :)

If this is not even enough I will try bigger architectures, but not as big as VGG, I think.

tyluckyma commented 7 years ago

haha, why don't you go for Pascal TitanX

aitorzip commented 7 years ago

I dream about it, but not enough budget :)

aitorzip / VPilot

size of the input layer and training models #2