0xbad1d3a5 / Kaku

画 - Japanese OCR Dictionary
https://kaku.fuwafuwa.ca/
BSD 3-Clause "New" or "Revised" License
203 stars 36 forks source link

Unable to stop TessBaseAPI from completing recognition #2

Closed 0xbad1d3a5 closed 7 years ago

0xbad1d3a5 commented 7 years ago

I think I know the cause here.

/** Make a text string from the internal data structures. */
char* TessBaseAPI::GetUTF8Text() {
  if (tesseract_ == NULL ||
      (!recognition_done_ && Recognize(NULL) < 0))
    return NULL;
  STRING text("");
  ResultIterator *it = GetIterator();
  do {
    if (it->Empty(RIL_PARA)) continue;
    char *para_text = it->GetUTF8Text(RIL_PARA);
    text += para_text;
    delete []para_text;
  } while (it->Next(RIL_PARA));
  char* result = new char[text.length() + 1];
  strncpy(result, text.string(), text.length() + 1);
  delete it;
  return result;
}

The GetUTF8Text() function in tesseract does not pass a monitor to Recognize, so recognition will never be canceled and will always block.

0xbad1d3a5 commented 7 years ago

Yup, that's definitely the problem. Calling TessBaseAPI.getHOCRText(0) instead allows you to stop and cancel the recognition.

0xbad1d3a5 commented 7 years ago

Fixed, as well as fixing some code upstream: https://github.com/rmtheis/tess-two/pull/186