Open zfhall opened 6 years ago
A bit more info... Both my stream and .conf file have framerate set to 5fps, however the output.mov is 20fps. Does this even matter or will the correct commands still be assigned to the correct frame in the .npz file?
Firstly apologies for the multiple posts, I've been incrementally solving my own issue.
Ok so i found the issue, the save_streaming_video_data.py script saves the frames into a video file at 20fps. My stream was set to 5fps, hence the "sped up" output video.
This brings me to a new question. What is the optimal fps to use? @RyanZotti your ffmpeg command sets the stream to 5fps
ffserver -f /etc/ff.conf_original & ffmpeg -v quiet -r 5 -s 320x240 -f video4linux2 -i /dev/video0 http://localhost/webcam.ffm
the -r is for frame rate
However your output video is saved at 20fps (see below). which one is it? does it matter?
In save_streaming_video_data.py
out = cv2.VideoWriter('output.mov',fourcc, 20.0, (320,240))
@zfhall thats a great catch. I forgot that I had changed the frame rate in the ffmpeg command but not in OpenCV. That was from a long time ago when I was dealing with performance / latency issues in the Pi and I forgot to mention it in the docs.
Anyways, it shouldn’t matter because I record the time stamp of each frame outside of ffmpeg and Open CV. I use the time stamps to assign commands / labels to each frame for the training data of the machine learning models. As long as I can assign the correct commands to each frame everything works out in the end.
@RyanZotti thanks for your reply. I had a feeling that its wasn't supposed to be mismatched like that.
Ok its good to know that id doesn't matter anyway. Its makes sens now you mention it. Were you using 20fps then? I may try with 10fps to avoid any problems with having too many similar frames.
cheers!
@zfhall Yeah, my OpenCV code uses 20 fps.
It’s also worth mentioning that, depending on the complexity of the machine learning model you train, the self driving functionality might only be able to process 2-5 fps anyways. Bigger models process frames slower. It takes about 200-400 milliseconds to process a frame for my bigger models.
@RyanZotti thats interesting, are those figures from running on the pi or laptop? If i have time I may play around with different frame rates and sizes. Tempted to go for 160x120 from the start to save on training time.
@zfhall Those figures are from my laptop. The Pi is significantly slower, something like 5-10x. So it’s much faster to stream predictions to the Pi from a laptop or other remote server. Real self driving cars basically have an on-board GPU.
Implementing max pooling in your convoluted neural net would probably have a similar effect to explicitly reducing frame size, so I never bothered reducing frame size explicitly, but you can try. Also, I highly recommemd you train on a GPU, which will train about 8-14x faster than a CPU. AWS has a lot of good GPU options for cheap.
@RyanZotti Thought so, I would have been impressed if the pi could do that! thanks for the CNN tip. However I plan on testing single layer NN, DNN and, CNN for comparison purposes. I Looks like you have already done this but id like to give it a go myself. So perhaps the smaller frame sizes will help with the others.
In regards to training on a GPU, My laptop has an Nvidia gtx860m, Its not great but i'm hoping I can use it and still see improvements over CPU. I'll let you know how it goes.
@RyanZotti , Sir. Like You used the RC Car. I myself made an RC Transmitter and receiver and made the car ready to drive. My Pi is also ready with the codes and all. Just training the model is left. But, Building the RC car myself, cost me a lot of time. Can You please provide me the training dataset that you made for your car!! Because, it is really time consuming, and I have less time left @RyanZotti Sir. I need to submit my college project. Please help me by providing the dataset sir!! I shall be highly obliged!
@prateekralhan The data set is huge, about 80-100 GB last time I checked (multiple hours of training data), so it’s not something I can easily send you. I have it hosted in S3, but AWS charges me for download requests by GB that come from outside of AWS, and if I open up my data set to the whole world it would probably get downloaded a lot and end up costing me a lot of money. Also, things like continuity of camera orientation matter between training and deployment, so it wouldn’t be a completely clean data set if your car is built differently.
If your school project just requires the machine learning, without a physical car, then I recommend downloading the Udacity dataset. I don’t have a copy of the link, but you can probably find it somewhere on Google. Keep in mind that the machine learning part is the hardest part. Building the car is comparatively easy.
@RyanZotti Sir, Thank You for taking out your time to hear out my concern!! I checked the udacity dataset, but isn't it meant for realtime CARS in the world, as in the streets, with other cars and stuff...But here, just like urs, I made a simple track using white A4 sheets... Then how can it be used here Sir!!
@prateekralhan It can’t be used for the RC car you built, but it’s enough to be a stand alone machine learning project, something you could complete on time. If your project is due in less than 6 weeks you likely won’t have enough time to complete the RC car project with RC car training data. The machine learning is the hardest part and takes at least a few months to master.
@RyanZotti Sir, Regarding the ML part, Its all done! Earlier i wasn't comfortable with it..But later on I did that coursera course on ML by Mr. Andrew Ng, and I successfully learnt it, passed its assignments too. So, the codes are ready for the car, just the training dataset was the issue, due to shortage of time...
@prateekralhan If you make a track that is a circuit then you can collect a large amount of training data in a relatively short amount of time. And maybe buy a spare battery for your car so you can just swap it out and continue collecting data. I have managed to collect around 18,000 frames (30 mins driving) in 50 mins before, and that's at 10fps so if you're streaming at 20fps it will be double! Good luck
@zfhall...Sir, the 18000 snaps u took..like u made a code or something out to take snapshots of the live video that is being streamed or You manually took them?
@prateekralhan the frames are taken from a mjpeg stream. Open CV is used to save the stream into an .mov file. Im using a lightly modified version of @RyanZotti code but the technique is the same. If you read the readme and study the code for this repo it should make sense.
in particular look at save_streaming_video_data.py
@RyanZotti I was wondering how many frames were in your final dataset? And what kind of results did you get when training single layer, deep, and conv nets? I am working with around 100,000 frames and getting 65% and 69% training accuracy for the single layer and deep nets respectively.
Currently I am in the middle of training a deep conv net so we'll see how that goes.
EDIT: I got the following results for the deep conv net: training accuracy: 0.89, validation accuracy: 0.90
First, @zfhall , very well done. That's not sarcasm, that's a genuine compliment. Of the hundreds of people following this repo, I think you are the only person to have actually gotten this far. This is a massive, complicated, and very time consuming project that requires a tremendous amount of dedication. Getting this far is a significant accomplishment.
Now, to answer your question: I have exactly 131,532 frames, or about 2 hours of driving.
A while back when I gave the PyData YouTube presentation I mentioned that my best performing model was a neural network with a single hidden layer. If I recall correctly, the accuracy of that model was around 68%, so you're getting results very similar to what I had at the time. Unfortunately, that level of accuracy can lead to rather embarrassing results (driving off the road, crashing into things, etc) when deployed in an unfamiliar environment. On the other hand, it's one of the faster models (not that it matters, really, if it sucks).
Fast forward about a year after the presentation: I gathered a lot more data, in particular on a circular track so that I could drive continuously and get more data for less effort. This probably resulted in a lot of similar frames, which could make the error seem lower (if similar-looking frames appear in both training and validation data sets). At the same time, I also tried some very deep models. My best deep model had 7 batch-normalized convolutional layers and had an accuracy of 94% on the circular track data. This very deep model had horrible latency (something like 700-800 ms per frame), but it did a much better job of staying on the track (at least when I reduced the forward motor speed to cope with the response latency). You'll have to let me know how yours turns out. I'm very interested.
Thanks @RyanZotti , that's very kind of you! I have been lucky enough to have lots of time to spend on this project and have somehow got this far despite my limited knowledge. I'ts been quite the journey and I feel I have learned a lot from the project. I definitely wouldn't have struggled without your code and was inspired by your PyData presentation so thank you again!
Its good to see that your dataset isn't too much bigger than mine and that you got similar results with the simple neural nets. I too collected my data on a circular track so could be suffering with the similar frames problem. The only difference being that I streamed at 10fps so perhaps there would be some reduction there.
I have literally just finished training on a slightly tweaked version of your model with 7 batch-normalised convolutional layers and achieved 93.5% on both training and validation accuracy! Hopefully I'll get the chance to real-world test it tonight so I'll let you know how it goes.
Hello @RyanZotti
After collecting training data I noticed that when I watch the output.mov file after a session, the video is greatly sped up and thus shorter than it should be. Is this intended or is it a fault on my end? I'm guessing its not supposed to be this way as the time stamps do not match up with the video. Any ideas?
Z