dbvideostriketeam / wubloader

MIT License
14 stars 6 forks source link

Add ability to OCR the bus odometer #356

Closed ekimekim closed 10 months ago

ekimekim commented 10 months ago

This is a big one and it's a mess, sorry. I'll try to clean it up a bit before merging.

This adds a component bus_analyzer. It watches segment directories and runs an analysis against each segment file. Right now the only thing that analysis does is OCR the current bus odometer.

It places this result (or NULL if it couldn't read the odo) into a database table bus_data. This table stores an odometer reading for each segment, along with a timestamp.

Finally, thrimshim has a method added that can read this table to fetch the latest reading. I put this in thrimshim because restreamer doesn't have database access.

We intend to use this endpoint on the VST website backend to drive some automation around "next point" etc.

An important point to note is that this OCR only works reliably on a high resolution BusCam, not on the small buscam in the corner of the main stream. We assume this is available under the channel name buscam (though this is of course configurable).

The OCR is very basic right now and only tries to read the first 4 digits, not the 1/10th mile digit which is a different shape and moves in 8 animation frames instead of ticking over instantly.

The OCR is based on the following process:

Future work:

ekimekim commented 10 months ago

By way of example, here is a test frame: test-0113 25 We grab its odometer: odo Within the odometer, we find each digit and normalize it: test-0113 25-digit0 test-0113 25-digit1 test-0113 25-digit2 test-0113 25-digit3 test-0113 25-digit4 Then we compare each one to the prototypes. Here is debug output for comparing digit 3:

Digit = 3 with score 0.07662805074830303
0: 0.6749983397240747
1: 0.49500557305278614
2: 0.7607713328448434
3: 0.899052645935728
4: 0.589575909300537
5: 0.7584262193861606
6: 0.7154477278085176
7: 0.5766948239415489
8: 0.7621688962241706
9: 0.822424595187425

We have picked the value 3 because it had the highest similarity (0.899), with 9 being the runner up (0.822) for a final score of 0.899 - 0.822 = 0.077. This is actually quite a low score! 3s and 9s are very similar in this font. We are saved by the other digits all being a more sure bet, which brings up our overall score:

Digit = 0 with score 0.12030949885125786
Digit = 1 with score 0.22004621713938277
Digit = 1 with score 0.2500588058898393
Digit = 3 with score 0.07662805074830303
test-frames/test-0113.25.png: 0113 with score 0.16676064315719574

Our threshold is currently 0.1, because an all-black frame scores a 0.7 because 1s aren't very different from all-black. You can see why this is a crappy scoring system, but it works well enough for now.

chrusher commented 10 months ago

This is a good implementation of the chosen solution. Where there is room for improvement I feel is in deciding whether a match is good or not.