Single threaded task-queue optimization

alex-ong commented 4 years ago

Simple single threaded task-queue.

needUpdateScore = False
needUpdateLines = False
needUpdateLevel = False
levelSealBroken = False
nextLevelTarget = None

Stage 1 - calculate needupdatescore

if UPDATE_PIECE_STATS:
    newPieceStats = updatePieceStats()
    if newPieceStats != oldPieceStats:
        needUpdateScore = True
else:
    needUpdateScore = True

Stage 2 - calculate needUpdateLines

if needUpdateScore:
    newScore = updateScore()
    needUpdateLines = newScore != oldScore

Stage 3 - calculate needUpdateLevel

if needUpdateLines:
    newLines = updateLines()
    if levelSealBroken and newLines > nextLevelTarget:
        newLevel += 1
        nextLevelTarget += 10
    else:
        needUpdateLevel = True

Stage 4 - calculate level. Only used until level seal is broken

if needUpdateLevel:
    newLevel = updateLevel()
    if newLevel != oldLevel:
        nextLevelTarget = int(newLines / 10) * 10
        levelSealBroken = True

alex-ong commented 4 years ago

@timotheeg Is there an easy way to modify task queue in place? i guess we could also design and implement an entire new class for it. That way it can support different implementations for piecestats (or no impl).

The cpu savings should be significant. We are on average completely removing the score/lines/level calculation, and level will never be OCR'ed again after transition.

Ofc this only works in single threaded, since in multi-threaded we just process everything simultaneously.

timotheeg commented 4 years ago

Hey Alex, sorry I just saw this (github notification get sent to my work email).

The idea looks good! I'll a longer comment tonight :)

timotheeg commented 4 years ago

Oooh, that's a nice idea!

Is there an easy way to modify task queue in place?

In its current implementation (a list), it's not very easy to manipulate the queue. With some simple changes to a dictionary to identify tasks by name or IDs, it should be a lot easier. A new class sounds good!

I have a dirty example of manipulating the queue! To support field scanning in das trainer, I needed to implement color interpolation, but that meant checking the field was dependent on having first read and OCRed the level. Since this order constraint could be resolved once at bootstrap, I went lazy. Instead of modifying the task queue to be a dictionary, I extracted the field task from the task queue, and made sure it'd be processed after everything else. Commit here (note: obviously this doesn't work when field scanning is running in an independent thread). Still, it would have been cleaner to manipulate tasks by name, rather than iterating the task queue.

One more note on this: I reckon than when a new piece is detected on the field, it might be beneficial to have the capture and OCR-ing of the values done in the next frame rather than current frame. 2 reasons I say this:

For capture systems which do not de-interlace (like my first capture device), when the score/lines/level values change, they can become unreadable for a frame (None or weird values). Since the values are guaranteed to be stable til the next piece, it can be more stable to delay by one frame.
In das trainer, the das data (current piece and current piece das value) are only updated one frame after the apparition of the piece in the field, so if you read on the same frame, you'll have the wrong data. Delaying by one frame for everything might make things more consistent.

alex-ong commented 4 years ago

Right, all those suggestions sound great; especially waiting one frame between next-piece on field and scanning numbers, since that will reduce load.

The class could be a simple state machine state 1 -> scan top of field for next piece state 2 -> wait 1f if necessary state 3 -> scan numbers, preview (can be slow) repeat.

Scanning top of field could easily occur at 120hz since it's such a small operation (nyquist theorem).

Integrating scanning the full field every frame would also need to be supported of course. This could be implmeented as a separate class (one class handles things that happen at 60hz, one thing handles things that update everytime a new piece spawns).

alex-ong commented 4 years ago

This is already implemented in basic form now.

alex-ong commented 4 years ago

This is implemented as FASTEST_STRATEGY. NAIVE is still there for an easier to follow implementation. FASTEST will eventually include error detection / full naive scan once in a while for error correction.

alex-ong / NESTrisOCR

Single threaded task-queue optimization #15