alex-ong / NESTrisOCR

OCR for statistics in NESTris
24 stars 7 forks source link

Refactor capturing API #32

Closed blakegong closed 4 years ago

blakegong commented 4 years ago

Still WIP. Creating PR for early 👀

Problems to be solved:

I've simplified the responsibility of Capture.get_frame() to not crop anything, but just respond faithfully. Such generic surface API should be kept as simple as possible. I think saving a little bit of RAM with cropping internally is not really worth it, but maintaining it is not very cheap with so many different capturing backends underneath. Let me know what you think.

Also if you agree with 👆 change, https://github.com/alex-ong/NESTrisOCR/blob/master/nestris_ocr/capturing/WindowAreasSlice.py could potentially be removed too as it would not provide as much a value as before.

Oh and the biggest change with this Capture change, is the contract between main.py and strategy.py (hwnd -> (ts, frame)). Let me know if that makes sense to you. Effectively Capture.get_frame() will only be called once per "tick". So no matter which capturing method you use, every piece of OCR would be processing from the same "frame" in one "tick".

The deinterlacer in OpenCV has been temporarily removed, to help myself focus on all the other things to fix first 😅 Will add it back later.

alex-ong commented 4 years ago

I've simplified the responsibility of Capture.get_frame() to not crop anything, but just respond faithfully

Good but prefer it to return an imageslicer() object rather than just an image. The imageslicer() object can contain an image and caller calls slice(coords). Each capture method would return their own form of imageslicer, with file/opencv/Mac returning what you'd expect, and win32 returning the file/open/Mac one on window_n_slice or a special one for direct_capture.

Direct_capture would actually break the one capture per frame rule but would be doing it secretly.

We could also push timestamp into the imageslicer object?

Oh and the biggest change with this Capture change, is the contract between main.py and strategy.py (hwnd -> (ts, frame)). Let me know if that makes sense to you. Effectively Capture.get_frame() will only be called once per "tick".

Yes I think hwnd being pushed all the way into capture method is great and correct. Calling getframe once per tick instead of nextframe once and then getframe 10x also makes sense for API simplicity.

So no matter which capturing method you use, every piece of OCR would be processing from the same "frame" in one "tick".

Right. However we need to make sure that direct_capture still works for win32 api, since it takes way too long if you actually capture the entire frame on 4k screens.