alex-ong / NESTrisOCR

OCR for statistics in NESTris
24 stars 7 forks source link

Add support for capture under OSX #1

Closed timotheeg closed 5 years ago

timotheeg commented 5 years ago

Add support for OSX.

I'm no OSX programming expert, and the PR doesn't error detection or tests, but it worked on my macbook under OSX 10.14, so I figure I could just submit the PR first anyway. I assume it still works fine under windows, but I don't have a windows machine to test.

There's room for optimizations:

I also moved the pngs out of the main directory to decluter a bit.

I hope this unsolicited PR is OK with you. Let me know if you'd like me to change anything.

Cheers! 😄

alex-ong commented 5 years ago

Hi,

Thanks so much for the pull request, i will have a look at it and definitely merge it!

To answer your questions:

in OSX the window ID doesn't change (although the position can), so we could skip the window search at each iteration

Could you clarify? In Win10, if change OBS's window position it's still fine, but if you change the position of the graphic within obs (via scaling obs or such) then you have to recalibrate. The windowID doesn't change in Win10 either - it rescans every iteration to handle the case of you restarting OBS.

I wonder if grabbing window areas via successive captureAndOCR() calls in one iteration could lead to inconsistent results

Yes and no. In Win10, grabbing the whole screen and then chopping bits out is really slow, since you are grabbing 1024x768 (or more, depends on how big your OBS window is) pixels vs 100x60x10 pixels. That's like a million pixels vs only 50-60 thousand. This is dependent on the plugin of course, but the reason i do multiple calls and even let it multithread the OCR portions is to reduce latency to sub 5ms.

There are also flags to wait at each scan so it doesn't scan at 1000hz and kill cpu cores. My original implementation was to grab the whole window and then split it into the parts; this performed at under 15fps. Also having a higher resolution OBS window (via having say a 4K monitor) would make performance even worse with capturing the whole screen.

For reference i have a 1440p monitor. I have also tried multiple screen capture APIs; some internally limit screenshotting to 60fps, so grabbing the 11 mini screenshots took 11/60 seconds. I've gone through around 3 screenshotting APIs and picked the fastest one for windows.

Could you test a big scan + split vs multiple small scans, just on a single core? If the big scan is faster then we should make the MacOS version default to a big scan and leave windows as is. Right now the layout isn't very OO, this could be changed to make it work. I'm happy to do some legwork in this regard, and get you to test.

alex-ong commented 5 years ago

Just looked at the PR; it looks very straightforward. If i test it and it still works on Windows I am happy to merge it :)

timotheeg commented 5 years ago

Hi there Alex! Thanks for doing a quick review of the PR! :)

in OSX the window ID doesn't change (although the position can), so we could skip the window search at each iteration

Could you clarify? In Win10, if change OBS's window position it's still fine, but if you change the position of the graphic within obs (via scaling obs or such) then you have to recalibrate.

Understood! I was not referring to that.

The windowID doesn't change in Win10 either - it rescans every iteration to handle the case of you restarting OBS

Right, this is what I was referring to. I based my comment on assuming that restarting OBS (or another capture program) is not something that happens often during capture.

Based on that, I though there could be one bootstrapping cycle on all windows to find the target window ID, and then use that window ID in each iteration thereafter to retrieve the window data directly. Retrieving the window info via that get_info_for_single_window(windowID) API (whatever it is in each platform - Quartz.CGWindowListCreateDescriptionFromArray([windowID]) in OSX), should indicate whether the window is still alive or closed. And if closed, screencap could go back into the bootstrap process to monitor for the window to be opened again till it gets the window ID again.

I assume in most cases, the same windowID would just work through the entire recording session.

I hope that makes sense 😅. It's probably negligible savings in comparison to the screen capture and OCR, but I figured every bit can help. Currently in this PR, for both win32 and quartz, calling getWindows() in screencap.py calls the native API to get a list of windows, iterate over it once to filter and transform it to a usable list, and then iterate another time over it to find the target windowby name.

I didn't add the window ID "persistence" this time, because I wanted to PR to make as little changes as possible to screencap.py to show that the OSX addition would be compatible in an (hopefully) obvious manner :)

Yes and no.

Thank you very much for all that detailed information! That was very helpful for me to understand all the work you have done. It's very impressive do manage to do all the capture and ocr within 5ms!

Reading this, it sounds like working on a OBS plugin to manipulate the input video stream directly, rather than doing an OBS screencap which upscales the input several times could give a big performance boost!

That brings me to some clarifications I should give you about my setup. I have actually not managed to find a capture device I'm happy with for OSX 😢. I bought like 4 ezcap devices, and they all had sound issues. I found another device that works, but OBS doesn't recognise the video input device (sigh 😑). So what I had to do is play the video stream in a window in the capture software provided, and get OBS to use a window capture on it, and that worked very well! Initially, although the window was small, the screencaps were huge. I figured that's because I have a retina display which upscales everything to make a virtual "normal" resolution, and the Quartz API I used in this PR captures the scaled rendering. I then used RDM to set the resolution to the hardware native res, and so now the input window was pixel accurate to its real size, and capturing that was equivalent to working with the real input video stream (yay!). For reference, on my setup, the full Tetris interface is only 585x470. Taking into consideration the "areas of interest", it would reduce that further still, and so I was thinking I could capture the minimal area from the main window as one frame and get all the data from it in one go.

On my original comment, I have not actually done any tests to check if the window did change in between grabs such that you'd get inconsistent data for a given frame. At the capture speed you mentioned, it is both unlikely, and would auto correct itself in one capture cycle. Still, I was thinking I'd be nice to have a process that guarantees frame-consistent data if possible 😅.

Could you test a big scan + split vs multiple small scans, just on a single core? If the big scan is faster then we should make the MacOS version default to a big scan and leave windows as is.

I'll try to give that a go. Not sure when I can next work on this, but I'll let you know :)

Right now the layout isn't very OO, this could be changed to make it work. I'm happy to do some legwork in this regard, and get you to test.

Roger that and thank you! No urgency on that front.

To tell you the truth, considering the PR works as it is for me, I thought I'd move to the next bit of interest to me: Recognition of the piece in the next piece window!

If I can pull that off, I could use the same code to recognize the pieces as they appear at the top of the stage during gameplay. The reason for this weird ask is because I'd like to use NESTrisOCR on the DAS Trainer Rom Hack to OCR the DAS value and track my progress over time. I have already modified NESTrisOCR to read the DAS value (was super easy - just add capture areas!), but unfortunately, since the piece stats are gone in Das Trainer, it's very hard to know the piece count, and I thought I could recompute the piece stats myself by monitoring the top of the stage when a piece appear, recognize the piece, and then do the counting myself. I have documented the feature request in my fork for now (I didn't want to pollute your repo). See this issue for details.

Apologies for this super long reply. If you read all the way, thank you! :D

alex-ong commented 5 years ago

Unfortunately I don't have the IQ to do an obs plugin. Also this can be used for emulator as well in the case that you're not necessarily using obs.

For capturing the sound, have you tried an RCA to 3.5mm adapter then using the microphone port on your computer? I guess that means you lose your microphone; I use my line-in port since the audio desyncs through my capture card (gvusb2).

For recognizing the piece, use the top of the field. Simply check for 7 piece patterns, else null piece. Whenever you transition from nullpiece to a valid piece, increment the counter. This is so simple I should just implement it, since it's faster and more stable than ocr, except that you better not drop any frames or detection will not work, which is a problem on 29+. (My capture card drops one frame per second, not a problem until gravity is one cell per frame). Though also I'd need to add game start detection to reset piece count. This is done in my other software by scanning for black/grey areas on the background to determine whether we are in game, and also successfully doesn't reset on pause

For capturing the whole field for saving as a replay, you have to deal with the line clear animation which is another hurdle. Not trivial but also not non trivial (keep track of active piece and block count in field, if block count reduces then you are in a line clear animation)

timotheeg commented 5 years ago

Unfortunately I don't have the IQ to do an obs plugin. Also this can be used for emulator as well in the case that you're not necessarily using obs.

I'm sure you could do a OBS plugin if you put your mind to it! 😄But that said, doing a window-based capture is indeed a great advantage, It is what made it possible for NESTrisOCR to work with my setup, and indeed works for emulators too.

For capturing the sound, have you tried an RCA to 3.5mm adapter then using the microphone port on your computer?

My macbook doesn't have a jack port for microphones 😑. In the end I'm not too bothered anyway, the setup I have now is not ideal, but I'm satisfied enough. Thanks for the suggestion though!

Many thanks for all the advice on capturing the top of field, or even whole field! These again are great pieces of information 😃.

I should just implement it, since it's faster and more stable than ocr

This is an interesting project, so I think I'll give it a shot too, maybe doing some experiment this weekend!

This is done in my other software by scanning for black/grey areas on the background to determine whether we are in game, and also successfully doesn't reset on pause

Is that other project also open source in github? 😅

In any case, I reckon this PR is a good first step for OSX support. If windows still work, you could ahead and merge it.

I still owe you the big window detection performance testing. When I get to it, and if the results are conclusive, I'll open a new PR to change the way the default method works on OSX, without affecting how the default capture works in Windows.

Cheers!

alex-ong commented 5 years ago

mergeroonied.