cathoderaydude / Babel

A simple translation tool for on-screen text using Google Cloud Vision and Translate. Intended for translating emulated videogames and similar things.
MIT License
2 stars 0 forks source link

State Machine #37

Open hbloom1783 opened 3 years ago

hbloom1783 commented 3 years ago

Going to separate this into a couple of posts to try to keep it readable.

So the basic flow of the app as it stands is this:

  1. Ingest an image from someplace.
  2. Submit the ingested image for OCR.
  3. Add phrases. (Either via the autophraser or manually)
  4. Translate phrases.
hbloom1783 commented 3 years ago

Notes on each step:

Ingest

Ingestion methods include:

Ingesting an image means blowing away the OCR data. It doesn't necessarily mean blowing away the phrasing data, though it will naturally blank their text until OCR is back.

OCR

OCR methods include:

We may be able to implement image filters, which would have to be applied to the image before submitting for OCR.

Phrasing

Phrasing methods include:

Phrase rectangles don't have any meaning before OCR is performed. If they're persisting from a previous OCR, they shouldn't be editable or displayed until OCR is complete again, at which point they need to re-load their underlying text and re-submit for translation.

Translation

Translation is always automatic for each phrase rectangle. This process is nicely simple and low-level,

hbloom1783 commented 3 years ago

So as far as states go, I see a need for these:

NoOCR

This is our initial state, and also the state immediately after ingestion. Phraserects, if we have them, are invisible. From here we can:

If we enter this state with AutoOCR on, and an image set, then we need to immediately send that image for OCR.

OCRed

We enter this state immediately after OCR is complete. We can now display/manipulate Phraserects. If we enter this state with Phraserects, we need to re-load their text and re-submit them for translation. From here we can:

hbloom1783 commented 3 years ago

Having written that out, it seems like most of the modalities of the GUI (disabling the OCR button while the OCR request is ongoing, displaying/hiding the viewfinder, etc) don't actually affect the core state of the app, and only two states to govern this are really needed: Either we have an OCR, or we don't.