AmitGorvadiya / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Crashes in 3.0 when scanning text with long words (or long lines) #244

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run 3.0 with image that has long words or horizontal lines

The problem goes away with this temporary modification to tfacepp.cpp:

#if defined(THISDOESNTWORK)
  if (word->blob_list ()->length () > MAX_UNDIVIDED_LENGTH) {
    return split_and_recog_word (word, denorm, matcher, tester, trainer,
      testing, raw_choice, blob_choices,
      outword);
  } else {
#else
  {
#endif

It needs a longer-term solution in split_and_recog_word.  Note that
this splitting is never done in normal OCR, only with malformed text
or long horizontal colored lines (lots of gaps when thresholded).

Original issue reported on code.google.com by edhamr...@aol.com on 28 Aug 2009 at 8:45

GoogleCodeExporter commented 9 years ago
can you provide example image for this issue so the latest code can be tested?

Original comment by zde...@gmail.com on 2 Aug 2011 at 8:02

GoogleCodeExporter commented 9 years ago
It's possible this was fixed with r675.  Please try your image again (reopen 
with attachment if this still is an issue).

Original comment by david.e...@gmail.com on 19 Feb 2012 at 10:13