charlesw / tesseract

A .Net wrapper for tesseract-ocr
Apache License 2.0
2.29k stars 744 forks source link

TESSDATA_PREFIX Environment Variable set but it still cannot find it #663

Open Hakxsorus opened 8 months ago

Hakxsorus commented 8 months ago

Error

Error opening data file tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

But I do have the environment variable set to the tessdata folder with eng.traineddaata.

image

It only works if I call it from my project root directory where tessdata folder is published into.

Works

PS D:\Development\Blitz\Blitz\bin\Release\net7.0\win-x86\publish> blitz scan

Does Not Work

PS C:\Users\mdabr> blitz scan
Error opening data file tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Unhandled exception. Tesseract.TesseractException: Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details.
   at Tesseract.TesseractEngine.Initialise(String datapath, String language, EngineMode engineMode, IEnumerable`1 configFiles, IDictionary`2 initialValues, Boolean setOnlyNonDebugVariables)
   at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable`1 configFiles, IDictionary`2 initialOptions, Boolean setOnlyNonDebugVariables)
   at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode)
   at Blitz.Program.RunScanMoronCommand(ScanMoronOptions opts) in D:\Development\Blitz\Blitz\Program.cs:line 156
   at Blitz.Program.<>c.<Main>b__1_3(ScanMoronOptions opts) in D:\Development\Blitz\Blitz\Program.cs:line 27
   at CommandLine.ParserResultExtensions.MapResult[T1,T2,T3,T4,T5,TResult](ParserResult`1 result, Func`2 parsedFunc1, Func`2 parsedFunc2, Func`2 parsedFunc3, Func`2 parsedFunc4, Func`2 parsedFunc5, Func`2 notParsedFunc)
   at Blitz.Program.Main(String[] args) in D:\Development\Blitz\Blitz\Program.cs:line 16
Hakxsorus commented 7 months ago

Workaround

I will not close this issue since the error persists. However, for anyone experiencing similar problems, I have found a workaround.

This is simply done by programmatically creating the tessdata directory and downloading eng.traineddata to a known location in the user's file system on app initialisation.

Note that this is for a production environment and only needs to be done once. Consider disabling this check for local debugging.

1. Get a known path (e.g. AppData)

Create the tessdata directory there.

private const string AppDataFolderName = "YourAppName";
private const string TessdataFolderName = "tessdata";

/// <summary>
/// Gets the path to Blitz's directory in the AppData folder.
/// </summary>
/// <returns>The application directory path.</returns>
public string GetAppDataFolderPath()
{
    var appDataPath = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
    return Path.Combine(appDataPath, AppDataFolderName);   
}

/// <summary>
/// Gets the path to Blitz's tessdata directory.
/// </summary>
/// <returns>The tessdata path.</returns>
public string GetTessdataFolderPath()
{
    return Path.Combine(GetAppDataFolderPath(), TessdataFolderName);
}

2. Download tessdata/*.traineddata to that path

Make sure to check the directory exists there before downloading.

/// <summary>
/// Downloads the Tesseract English language model to the tessdata folder.
/// </summary>
/// <param name="tessdataFolderPath">The path to the tessdata folder.</param>
private static async Task DownloadTrainedData(string tessdataFolderPath)
{
    const string tessdataEngFileName = "eng.traineddata";
    const string tessdataEngUrl = "https://github.com/tesseract-ocr/tessdata_fast/raw/main/eng.traineddata";

    using var client = new HttpClient();

    await using var stream = await client.GetStreamAsync(tessdataEngUrl);
    await using var fs = new FileStream(Path.Combine(tessdataFolderPath, tessdataEngFileName),
        FileMode.OpenOrCreate);

    await stream.CopyToAsync(fs);
}

3. Initialize the engine using your defined path

using var engine = new TesseractEngine(tessdataFolderPath, "eng", EngineMode.Default);