Training data - Githubissues

jankoc commented 3 weeks ago

Is the training data available anywhere? And how accurate would you expect this method to be with more training data?

jankoc commented 3 weeks ago

Sorry, I found the Google Drive link in the readme. Still wondering about robustness and what kind of accuracy one would expect for real-life data, though. Thinking of applying this for relatively high quality data (e.g. from here https://loggator.com/recent_events) to extract the course and be able to automatically show e.g. the longest leg with the map rotated along the line of the leg.

KiK0S commented 3 weeks ago

Hi @jankoc, You can reach me at kostya.amelichev@gmail.com and request for training data of needed. Currently most of the data is labeled by hand with labelme software and I used mostly urban races maps from one specific organiser as they were easily obtainable. Also check the readme for link to google drive, it might just work.

So, this repo is my experiments mostly and it is not about good predictions quality for each step of the process but rather about the pipeline for urban races analysis. If you are interested in extracting the course info (controls, start, connections), the pipeline is this:

course layer detection - it separates all the "pink" info from map: triangles, circles and lines. This is relatively simple problem for a computer as it is basically a color mask. I apply U-Net on top to filter out other pink info where possible (e.g. dangerous territories) and to improve overall quality
on top of that you need to find triangles and circles. There is no machine learning whatsoever, just some OpenCV stuff with kernels and polygon approximations.
to find the course configuration, you can use some tsp solvers with custom scoring based on if two circles seem connected by line or not. This is not tunable as well. quadtree.py is a file for that.

As you can see, there is not a lot of ML involved so I am not sure that feeding more data will boost the quality. With good hyperparameters i would expect an accuracy of 80%+ for circle detection, a bit less for triangle detection as it takes more hyperparameters. I would expect most quality issues with scoring if two controls are adjacent - but it might work fine for forest maps - the configuration is usually cleaner there. Please note this repo doesn't have any number detection so you can't really work with butterflies (eg. 1/4/7) now or understand direction of the leg if you look at it without the controls permutation context.

Also one of the issues I faced was giving maps a similar scale factor. It is extremely important for circles and triangles detectors to know approximate size of a shape. I currently do max pooling for that (eg make every picture 1500x1500 or less), but i haven't experimented enough on more heterogeneous datasets. Probably would need some adjustments as well.

KiK0S / maps-analyzer

Training data #4