add "auto calibration" proof of concept

Brett824 commented 5 years ago

Hey!

I've been working on a similar project over at https://github.com/Brett824/NESTetrisCapture and just stumbled upon your repo. I've learned a lot looking at your code but one frustration was the manual calibration - I've implemented "auto calibrating" in my implementation using ORB feature detection and it's done a pretty good job for me so far. This isn't a complete implementation, but it's able to go from a broad capture region:

pre

To the following cropped + highlighted region:

post

It doesn't line up perfectly because my own template PNG is slightly misaligned for your coords, and I'm not 100% sure what's going on with the stats text. It's just a POC I wanted to throw up to see if you're interested in any collaboration.

alex-ong commented 5 years ago

Yeah sounds like a good idea; I'm busy at CTAC this weekend but will have a look later. Does it work or do i have to do things?

What is the intended workflow? I know that calibrating is annoying but you literally only have to do it once. I'm guessing since you are OCR'ing lots of different streamers it would be annoying, however I think for the normal use case (this is used for your own stream and just capturing OBS), a GUI with rectangle select + zoom is more than sufficient since you only calibrate once. My own stream for example has this layout; a gui is far more useful: https://imgur.com/a/zeXvo2x

I'm all for super smart auto calibration.

Unrelated but have you got benchmarks of your opencv2 performance? I originally used tesseract (took >2s for just 6 digits so i skipped that). It looks like you can get 30fps (or is that just the very bad inefficient renderer)

Brett824 commented 5 years ago

This doesn't work quite yet - didn't know how open you were to new stuff (or if you were just working on this for the personal challenge of it). I'm happy to clean it up and make it a bit more production ready.

My use case has been OCRing a ton of different streamers and sources of footage to pull aggregate data to analyze, so I needed to have 100% hands-off region detection so I could automate it. For a streamer's single use, I'd imagine it'd be more useful just to speed up the process/augment a GUI - it'll usually get it 99% right, but then you'd slightly modify it from there.

Unrelated but have you got benchmarks of your opencv2 performance? I originally used tesseract (took >2s for just 6 digits so i skipped that). It looks like you can get 30fps (or is that just the very bad inefficient renderer)

When reading from a pre-recorded video, I've been processing at around 200-300 FPS now that I've got a multithreaded video buffer to remove reading the video's frames as a bottleneck. The clip in my readme was actually bottlenecked by really inefficient pygame code. Using a quick hack I did of your Win32GUI capture code, I'm getting my full process at 90-100FPS (mostly bottlenecked by image capture).

Caveat: I only recently added code to capture piece stats on the left - and my method so far sucks at it. The red-on-black just doesn't create as clean of a binary image to diff, so I need to do more refining to get it 100% accurate. For score/level/lines and for the tetris board, I would describe it as 100% accurate except for the most dire low quality youtube footage I've found.

alex-ong commented 5 years ago

Wow you type fast!

When reading from a pre-recorded video, I've been processing at around 200-300 FPS now that I've got a multithreaded video buffer to remove reading the video's frames as a bottleneck.

Nice, very fast. Yeah the bottleneck with mss() is that it captures at exactly 60fps so the best way to use it is to mss() then subImage which is... slow. Win32GUI is fast if you capture small subregions but slow once you get a big enough block. I'd imagine its slower than MSS once you have to capture the field. Since your not doing anything realtime (for your main usecase, youtube videos) its moot anyway.

Caveat: I only recently added code to capture piece stats on the left - and my method so far sucks at it. The red-on-black just doesn't create as clean of a binary image to diff,

I had the same problem. The way i solved it is to press printscreen on your OBS so that your source images match. Problem is you can't do this for other people's streams. This is a problem with all composite based capture cards (emulators are obviously fine), since red is one of those colours that sucks really bad.

Once I realised that OCR is multiple magnitudes slower than just scanning the field and detecting piece based on that i abandoned it because scanning the field is just better, and works for stream layouts where you don't display stats, e.g. CTWC. There's no way that doing OCR on 21 digits, maybe 8 digits with optimization and just looking at last digit) could be faster than checking 8~ pixels colours against black and then a lookup table)

One problem you've probably faced since you're scanning the field is de-interlacing. I'm guessing that you probably pre-process interlaced inputs with something like yadif-2x to fix it though.

Brett824 commented 5 years ago

My problem with scanning the field for stuff like that has been mostly footage/steam quality problems (not typical of your use case) - streamers drop so many frames and 30FPS footage + level 19 just misses so many frames. I wonder if there's an optimal hybrid approach for a mix of speed and reliability for edge cases like missed frames.

One problem you've probably faced since you're scanning the field is de-interlacing. I'm guessing that you probably pre-process interlaced inputs with something like yadif-2x to fix it though.

Yeah - I haven't automated this yet because I've just started to be picky about my sources and most people have started using their own deinterlacing. My workflow for large batch processing has been something like using youtube-dl to download large amounts of footage (twitch archives, youtube vids, etc), and then if I need to clean it up I'll just run it through ffmpeg with any filters I need to clean it up (mostly yadif for deinterlacing)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.010    0.010    8.924    8.924 read_digits.py:1(<module>)
    10001    0.162    0.000    8.804    0.001 read_digits.py:56(extract_digits)
    60006    6.042    0.000    7.790    0.000 read_digits.py:28(extract_digit)
   120044    1.775    0.000    1.775    0.000 {resize}
    60024    0.076    0.000    0.779    0.000 convenience.py:65(resize)
   640000    0.134    0.000    0.579    0.000 numeric.py:380(count_nonzero)
   640000    0.444    0.000    0.444    0.000 {numpy.core.multiarray.count_nonzero}
        7    0.029    0.004    0.148    0.021 __init__.py:1(<module>)
        1    0.006    0.006    0.095    0.095 __init__.py:106(<module>)
        1    0.001    0.001    0.076    0.076 add_newdocs.py:10(<module>)
        1    0.001    0.001    0.060    0.060 type_check.py:3(<module>)
    60086    0.058    0.000    0.058    0.000 {min}
    10015    0.051    0.000    0.051    0.000 {cvtColor}

here's some cprofile output of my digit OCR on a preloaded 6 digit score image 10k times. It's able to do it fast but it's not perfect for trying to capture all the digits on a typical tetris screen as fast as I want. I haven't actually experimented with threading the OCR tasks though.