charlesw / tesseract

A .Net wrapper for tesseract-ocr
Apache License 2.0
2.25k stars 741 forks source link

Support monitoring OCR process #112

Open FabriceChaverot opened 10 years ago

FabriceChaverot commented 10 years ago

For a real time application I need timeout fonctionnalities because sometimes I have OCR times upper than 1.5 seconds. I have seen in Tesseract wrapper this code:

   private void Recognize()
    {            
        if (!runRecognitionPhase) {
            if (Interop.TessApi.BaseApiRecognize(Engine.Handle, new HandleRef(this, IntPtr.Zero)) != 0) {
                throw new InvalidOperationException("Recognition of image failed.");
            }
            runRecognitionPhase = true;
        }
    }

the second parameter in TessApiBaseApiRecognize is a monitor structure which has a timeout parameter, I have looked at the C++ code Tesseract and I have found this structure:

class ETEXT_DESC { // output header public: inT16 count; // chars in this buffer(0) inT16 progress; // percent complete increasing (0-100) inT8 more_to_come; // true if not last volatile inT8 ocr_alive; // ocr sets to 1, HP 0 inT8 err_code; // for errcode use CANCEL_FUNC cancel; // returns true to cancel void* cancel_this; // this or other data for cancel struct timeval end_time; // time to stop. expected to be set only by call // to set_deadline_msecs() EANYCODE_CHAR text[1]; // character data

ETEXT_DESC() : count(0), progress(0), more_to_come(0), ocr_alive(0), err_code(0), cancel(NULL), cancel_this(NULL) { end_time.tv_sec = 0; end_time.tv_usec = 0; }

which is what they call the monitor

Is it possible to have this fonctionnality in the .net wrapper ?

charlesw commented 10 years ago

Should be possible in theory however I've got my hands full with the upcoming 3.03 release and my other priorities. If you want to feel free to give implementing this a go. On 8 Aug 2014 17:26, "FabriceChaverot" notifications@github.com wrote:

For a real time application I need timeout fonctionnalities because sometimes I have OCR times upper than 1.5 seconds. I have seen in Tesseract wrapper this code:

private void Recognize() { if (!runRecognitionPhase) { if (Interop.TessApi.BaseApiRecognize(Engine.Handle, new HandleRef(this, IntPtr.Zero)) != 0) { throw new InvalidOperationException("Recognition of image failed."); } runRecognitionPhase = true; } }

the second parameter in TessApiBaseApiRecognize is a monitor structure which has a timeout parameter, I have looked at the C++ code Tesseract and I have found this structure:

class ETEXT_DESC { // output header public: inT16 count; // chars in this buffer(0) inT16 progress; // percent complete increasing (0-100) inT8 more_to_come; // true if not last volatile inT8 ocr_alive; // ocr sets to 1, HP 0 inT8 err_code; // for errcode use CANCEL_FUNC cancel; // returns true to cancel void* cancel_this; // this or other data for cancel struct timeval end_time; // time to stop. expected to be set only by call // to set_deadline_msecs() EANYCODE_CHAR text[1]; // character data

ETEXT_DESC() : count(0), progress(0), more_to_come(0), ocr_alive(0), err_code(0), cancel(NULL), cancel_this(NULL) { end_time.tv_sec = 0; end_time.tv_usec = 0; }

which is what they call the monitor

Is it possible to have this fonctionnality in the .net wrapper ?

— Reply to this email directly or view it on GitHub https://github.com/charlesw/tesseract/issues/112.

jay-hill commented 7 years ago

I've implemented the ability to specify a timeout for the OCR process (without support for monitoring). Would it be worthwhile to put this feature into a pull request? The changes to the API were very minor (they are checked into the add-timeout branch on my page)

charlesw commented 7 years ago

Yes, any pull requests are welcome ;)

Note that while changes to the API are fine they should be backwards compatible (e.g. overload methods instead of just adding parameters etc).

On Mon, 3 Oct 2016, 23:37 jay-hill, notifications@github.com wrote:

I've implemented the ability to specify a timeout for the OCR process (without support for monitoring). Would it be worthwhile to put this feature into a pull request? The changes to the API were very minor (they are checked into the add-timeout branch on my page)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/112#issuecomment-251234846, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPzyM-n_JefZYaCjMmwu70f-HYLfltAks5qwXWegaJpZM4CVZFT .