AutoMecUA / AutoMec-AD

Autonomous RC car with the help of ROS Noetic and ML.
GNU General Public License v3.0
15 stars 2 forks source link

Explore usage of vision transformers #195

Open manuelgitgomes opened 1 year ago

manuelgitgomes commented 1 year ago

Useful links:

andrefdre commented 1 year ago

I researched about vision transformers and managed to implement code from this tutorial https://medium.com/mlearning-ai/vision-transformers-from-scratch-pytorch-a-step-by-step-guide-96c3313c2e0c, but the results weren't good due to the amount of overfitting. I read in one of the papers that this network requires huge amounts of data, even more than CNNs. Currently, we have a dataset with 10k images, which clearly isn't sufficient. Should I try with a larger dataset, let's say with 100k images, or this network is too much overkill for what we are trying to do?