charlesw / tesseract

A .Net wrapper for tesseract-ocr
Apache License 2.0
2.25k stars 743 forks source link

Tesseract engine fails to initialize if run from webserver #573

Open HenrikHolmIT opened 2 years ago

HenrikHolmIT commented 2 years ago

I've created a class library that has a reference to Tesseract and I have a Parser class that has a function to lift a document. This works fine from a Commandline App. If however I reference my class library from a webservice project I am told that the Tesseract engine failed to initialize.

I have language resources available and the code does work from Commandline.

I instantiate the engine by this line and it throws the exception

TesseractEngine engine = new TesseractEngine(@"./tessdata", language.ToString(), EngineMode.Default);

language is an enum and the current value is 'dan' for which I have the Danish language pack available.

Tesseract.TesseractException HResult=0x80131500 Message=Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details. Source=Tesseract StackTrace: at Tesseract.TesseractEngine.Initialise(String datapath, String language, EngineMode engineMode, IEnumerable1 configFiles, IDictionary2 initialValues, Boolean setOnlyNonDebugVariables) at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable1 configFiles, IDictionary2 initialOptions, Boolean setOnlyNonDebugVariables) at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode) at Inventio.OCR.Parser.LiftDocument(Byte[] document, Language language) in C:\Development\Projects\Inventio OCR\Inventio.OCR\Parser.cs:line 23 at OCR_WebService.Controllers.OcrController.Post() in C:\Development\Projects\Inventio OCR\OCR WebService\Controllers\OcrController.cs:line 29 at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass6_2.b__2(Object instance, Object[] methodParameters) at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.Execute(Object instance, Object[] arguments) at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)

HenrikHolmIT commented 2 years ago

@"./tessdata" path did not work for webservices. So I've added a binPath parameter to my function and calculate the path to bin folder this way

string binPath = HttpContext.Current.Server.MapPath("..") + @"\bin";

Works locally.

charlesw commented 2 years ago

I'm pretty sure it's because the working directory isn't what you think it is and therefore it can't find the language data.

Solution is normally to resolve an absolute path and then use that. I'm pretty sure the asp.net MVC demo does that.

On Thu, 9 Sep 2021, 21:17 HenrikHolmIT, @.***> wrote:

I've created a class library that has a reference to Tesseract and I have a Parser class that has a function to lift a document. This works fine from a Commandline App. If however I reference my class library from a webservice project I am told that the Tesseract engine failed to initialize.

I have language resources available and the code does work from Commandline.

I instantiate the engine by this line and it throws the exception

TesseractEngine engine = new TesseractEngine(@"./tessdata", language.ToString(), EngineMode.Default);

language is an enum and the current value is 'dan' for which I have the Danish language pack available.

Tesseract.TesseractException HResult=0x80131500 Message=Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details. Source=Tesseract StackTrace: at Tesseract.TesseractEngine.Initialise(String datapath, String language, EngineMode engineMode, IEnumerable1 configFiles, IDictionary2 initialValues, Boolean setOnlyNonDebugVariables) at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable1 configFiles, IDictionary2 initialOptions, Boolean setOnlyNonDebugVariables) at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode) at Inventio.OCR.Parser.LiftDocument(Byte[] document, Language language) in C:\Development\Projects\Inventio OCR\Inventio.OCR\Parser.cs:line 23 at OCR_WebService.Controllers.OcrController.Post() in C:\Development\Projects\Inventio OCR\OCR WebService\Controllers\OcrController.cs:line 29 at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass6_2.b__2(Object instance, Object[] methodParameters) at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.Execute(Object instance, Object[] arguments) at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7HSC32GWMULEUDRRQWADUBCJVBANCNFSM5DW4SKJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

HenrikHolmIT commented 2 years ago

@charlesw That is true. I now set the bin path in the client code. If commandline tesseract is initialized with 'Path.GetDirectoryName( Assembly.GetExecutingAssembly().Location)' and if webservice 'HttpContext.Current.Server.MapPath("..")'.