Closed lauglam closed 2 years ago
You cant catch it with a normal try catch?
You cant catch it with a normal try catch?
yes, it can't be caught
Unable to capture, the program crashes directly here
This is my test image test_img.zip
At the moment I'm trying to make a new nuget package with Tesseract 5.1 in it, I'll let you know when it is done so you could try that one.
I just updated the code on GitHub to Tesseract 5.1, try to clone it and see if this version solves your problem.
https://github.com/Sicos1977/TesseractOCR/commit/6419b56ceb4e7e9c1102d9fa7aace662582a4852 and 0d21da2606285c2a61aad5143a4274b1c1ee6a81
I just released a new nuget package with Tesseract updated to version 5.1
Unfortunately, the error still exists
This is the program I use for testing
Can you start Tesseract.exe without any problems?
Nevermind I think that this is your problem. You are disposing the page object and thus destroing the reference to the Blocks object.
This works without any problems:
static void Main(string[] args)
{
var result = new StringBuilder();
using var engine = new TesseractOCR.Engine(@".\", Language.English, EngineMode.Default);
using var pix = TesseractOCR.Pix.Image.LoadFromFile(@".\test_img.png");
using var page = engine.Process(pix);
foreach (var block in page.Layout)
{
result.AppendLine($"Block confidence: {block.Confidence}");
if (block.BoundingBox != null)
{
var boundingBox = block.BoundingBox.Value;
result.AppendLine($"Block bounding box X1 '{boundingBox.X1}', Y1 '{boundingBox.Y2}', X2 " +
$"'{boundingBox.X2}', Y2 '{boundingBox.Y2}', width '{boundingBox.Width}', height '{boundingBox.Height}'");
}
result.AppendLine($"Block text: {block.Text}");
}
Console.WriteLine(result.ToString());
}
Do not dispose the object before you are done using it, because it will destroy all the references to the Tesseract51.dll and thus giving you the error !!!
I like the mange strip drawing style :-)
If your goal is to just get the text from the page then you also can use page.Text
It's my fault, thank you very much for correcting me.
No problems, same happend to me when I started using Tesseract :-) ... it is good to make mistakes... you learn from it.
Thank you for reminding. Forgive my bad English, thanks again
Your English is fine, I'm also not a native English talking person (I'm from the Netherlands) so I guess real English people have something to comment about me also :-)
I wrote something like this and it works fine, thanks again.
public static IEnumerable<Block> GetBlocks(string path, Language language = Language.English)
{
// ReSharper disable once StringLiteralTypo
var engine = new Engine(@".\trained_data", language, EngineMode.Default);
var pix = TesseractOCR.Pix.Image.LoadFromFile(path);
var page = engine.Process(pix);
return page.Layout;
}
public static IEnumerable<Paragraph> GetParagraphs(string path, Language language = Language.English)
{
var blocks = GetBlocks(path, language);
return from block in blocks from paragraph in block.Paragraphs select paragraph;
}
public static IEnumerable<TextLine> GetTextLines(string path, Language language = Language.English)
{
var paragraphs = GetParagraphs(path, language);
return from paragraph in paragraphs from textLine in paragraph.TextLines select textLine;
}
public static IEnumerable<Word> GetWords(string path, Language language = Language.English)
{
var textLines = GetTextLines(path, language);
return from textLine in textLines from word in textLine.Words select word;
}
public static IEnumerable<Symbol> GetSymbols(string path, Language language = Language.English)
{
var words = GetWords(path, language);
return from word in words from symbol in word.Symbols select symbol;
}
Something wrong in this code.
I can't catch this exception, maybe because it's RuntimeDllImport
https://github.com/Sicos1977/TesseractOCR/blob/3abe128b3434f1d4675948dac6bdcc5d88d8a4ed/TesseractOCR/Layout/EnumeratorBase.cs#L220