Sicos1977 / TesseractOCR

A .net library to work with Google's Tesseract
167 stars 21 forks source link

Something wrong in this code #4

Closed lauglam closed 2 years ago

lauglam commented 2 years ago

Something wrong in this code.

I can't catch this exception, maybe because it's RuntimeDllImport

https://github.com/Sicos1977/TesseractOCR/blob/3abe128b3434f1d4675948dac6bdcc5d88d8a4ed/TesseractOCR/Layout/EnumeratorBase.cs#L220

image

Sicos1977 commented 2 years ago

You cant catch it with a normal try catch?

lauglam commented 2 years ago

You cant catch it with a normal try catch?

yes, it can't be caught

lauglam commented 2 years ago

Unable to capture, the program crashes directly here

image

lauglam commented 2 years ago

This is my test image test_img.zip

Sicos1977 commented 2 years ago

At the moment I'm trying to make a new nuget package with Tesseract 5.1 in it, I'll let you know when it is done so you could try that one.

Sicos1977 commented 2 years ago

I just updated the code on GitHub to Tesseract 5.1, try to clone it and see if this version solves your problem.

Sicos1977 commented 2 years ago

https://github.com/Sicos1977/TesseractOCR/commit/6419b56ceb4e7e9c1102d9fa7aace662582a4852 and 0d21da2606285c2a61aad5143a4274b1c1ee6a81

Sicos1977 commented 2 years ago

I just released a new nuget package with Tesseract updated to version 5.1

lauglam commented 2 years ago

Unfortunately, the error still exists

lauglam commented 2 years ago

This is the program I use for testing

ConsoleApp1.zip

Sicos1977 commented 2 years ago

Can you start Tesseract.exe without any problems?

Sicos1977 commented 2 years ago

Nevermind I think that this is your problem. You are disposing the page object and thus destroing the reference to the Blocks object.

image

Sicos1977 commented 2 years ago

This works without any problems:

    static void Main(string[] args)
    {
        var result = new StringBuilder();
        using var engine = new TesseractOCR.Engine(@".\", Language.English, EngineMode.Default);
        using var pix = TesseractOCR.Pix.Image.LoadFromFile(@".\test_img.png");
        using var page = engine.Process(pix);
        foreach (var block in page.Layout)
        {
            result.AppendLine($"Block confidence: {block.Confidence}");
            if (block.BoundingBox != null)
            {
                var boundingBox = block.BoundingBox.Value;
                result.AppendLine($"Block bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
                                  $"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
            }
            result.AppendLine($"Block text: {block.Text}");
        }

        Console.WriteLine(result.ToString());
    }

image

Do not dispose the object before you are done using it, because it will destroy all the references to the Tesseract51.dll and thus giving you the error !!!

Sicos1977 commented 2 years ago

I like the mange strip drawing style :-)

Sicos1977 commented 2 years ago

If your goal is to just get the text from the page then you also can use page.Text

image

lauglam commented 2 years ago

It's my fault, thank you very much for correcting me.

Sicos1977 commented 2 years ago

No problems, same happend to me when I started using Tesseract :-) ... it is good to make mistakes... you learn from it.

lauglam commented 2 years ago

Thank you for reminding. Forgive my bad English, thanks again

Sicos1977 commented 2 years ago

Your English is fine, I'm also not a native English talking person (I'm from the Netherlands) so I guess real English people have something to comment about me also :-)

lauglam commented 2 years ago

I wrote something like this and it works fine, thanks again.

public static IEnumerable<Block> GetBlocks(string path, Language language = Language.English)
{
    // ReSharper disable once StringLiteralTypo
    var engine = new Engine(@".\trained_data", language, EngineMode.Default);
    var pix = TesseractOCR.Pix.Image.LoadFromFile(path);
    var page = engine.Process(pix);

    return page.Layout;
}

public static IEnumerable<Paragraph> GetParagraphs(string path, Language language = Language.English)
{
    var blocks = GetBlocks(path, language);
    return from block in blocks from paragraph in block.Paragraphs select paragraph;
}

public static IEnumerable<TextLine> GetTextLines(string path, Language language = Language.English)
{
    var paragraphs = GetParagraphs(path, language);
    return from paragraph in paragraphs from textLine in paragraph.TextLines select textLine;
}

public static IEnumerable<Word> GetWords(string path, Language language = Language.English)
{
    var textLines = GetTextLines(path, language);
    return from textLine in textLines from word in textLine.Words select word;
}

public static IEnumerable<Symbol> GetSymbols(string path, Language language = Language.English)
{
    var words = GetWords(path, language);
    return from word in words from symbol in word.Symbols select symbol;
}