alex-ong / NESTrisOCR

OCR for statistics in NESTris
24 stars 7 forks source link

Single Thread #19

Closed alex-ong closed 4 years ago

alex-ong commented 4 years ago

Trying to move everything to single thread.

Current performance (multithread): [x] score [x] lines [x] level [x] preview [x] fieldstats 0.002 to 0.004

Current performance for (singlethread): [x] score [x] lines [x] preview [x] fieldstats [x] field 0.008~

Target is 0.001 for singlethread. I think i can get this by queuing number reads to only be per piece, and reducing digits read. Multithread runs all tasks simultaneously. Singlethread can create meaningful dependencies that cut out a lot of work, for example only scanning 1 digit out of lines.

alex-ong commented 4 years ago

I've got a preliminary version working. It's in master as main2.py

Usual runtime is 0.003-0.004. It's not feature complete (field does the scanning but no field-comparisons for certain features). It works about 1ms faster on DIRECT_CAPTURE vs WINDOW_N_SLICE.

SInce it outperforms MT, it would probably be ideal to just port everything over to single thread eventually, once it's feature complete.

alex-ong commented 4 years ago
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    831/1    0.003    0.000   48.448   48.448 {built-in method builtins.exec}
        1    0.000    0.000   48.448   48.448 main2.py:1(<module>)
        1    0.089    0.089   47.629   47.629 main2.py:33(main)
    23176   33.651    0.001   33.651    0.001 {built-in method time.sleep}
     2905    0.018    0.000   12.785    0.004 FullStateOCR.py:63(update)
     2616    0.221    0.000   11.483    0.004 FullStateOCR.py:92(update_ingame)
    14031    0.016    0.000    9.858    0.001 OCRHelpers.py:44(get_sub_image)
    14031    0.020    0.000    9.842    0.001 Win32UICapture.py:90(ImageCapture)
    14031    0.131    0.000    9.822    0.001 Win32UICapture.py:62(ImageCapture)
     2615    0.010    0.000    6.158    0.002 OCRHelpers.py:67(scan_field)
    14031    6.093    0.000    6.093    0.000 {method 'BitBlt' of 'PyCDC' objects}
     3527    0.013    0.000    4.977    0.001 OCRHelpers.py:50(scan_text)
     2905    0.010    0.000    4.070    0.001 OCRHelpers.py:64(scan_lines)
     2616    0.012    0.000    3.662    0.001 FullStateOCR.py:186(get_lines_cleared)
     3526    0.026    0.000    2.157    0.001 DigitOCR.py:116(scoreImage)
     2615    0.005    0.000    1.307    0.000 OCRHelpers.py:77(scan_spawn)
      289    0.002    0.000    1.284    0.004 FullStateOCR.py:70(update_menu)
     3526    0.034    0.000    1.212    0.000 DigitOCR.py:87(convertImg)
     5816    0.987    0.000    0.987    0.000 {built-in method builtins.print}
    11725    0.916    0.000    0.974    0.000 {built-in method numpy.array}
     4714    0.394    0.000    0.917    0.000 DigitOCR.py:55(getDigit)
    14030    0.038    0.000    0.879    0.000 Image.py:2552(frombuffer)
     3686    0.004    0.000    0.856    0.000 _asarray.py:16(asarray)

ImageCapture (i.e. capturing from screen -> texture) is the most expensive thing. It would be possible to optimize a bit more (atm it captures all 3/6 digits of lines/score, but only processes the last one/last two), but i think that capturing the Field is a much larger surface area (200 blocks vs 3 blocks), so its a kind of moot point.

Also, the startup cost of Win32UICapture is high enough that its faster to scan a 400x600 pixel area representing the field than it is to scan 200 individual pixels.

alex-ong commented 4 years ago

Update: sub millisecond bois!

0.0009975433349609375
0.0010008811950683594
0.002001047134399414
0.0
0.0009996891021728516
0.0
0.001001119613647461
0.0

This is how long it takes to process each frame, excluding NextFrame(), which is blocking and waits for OpenCV. The "speedup" was to downscale the image as soon as OpenCV gets it, so overall runtime is probably still the same... Previous test was slightly-flawed because i didn't call NextFrame(), so was getting same frame for a full second, though in terms of testing speed of UpdateLoop() it doesnt matter i guess.

Win32Captures NextFrame() takes 0ms, since the capture occurs in ImageCapture. OpenCV's capture occurs in NextFrame(), with ImageCapture referencing the same frame until NextFrame() is called. So all the test timings above are still valid, except the PerfTable in the previous post vastly-underestimates NextFrame()'s runtime

alex-ong commented 4 years ago

multithread setting is currently ignored. We only need to remove all references to import multithread, and remove it from settings and we're done.

alex-ong commented 4 years ago

Multithread has been removed. Closing.