Closed vsolominov closed 1 year ago
Is this option available through the api that I implemented? If so then I can expose it to C# for you. If it is not in there then not. Also don't know if we can compile it back into the the tesseract dll
I added a property called FontPointSize on the Words class. This does what you need. All the other information is not supported anymore and so I removed them.
Just get the latest nuget package
See this for more information --> https://github.com/tesseract-ocr/tesseract/issues/1074
Thanks a lot!
But I would not be so categorical with FontAttributes
, since this property is not always null. For example, if the solution uses EngineMode
equal to TesseractOnly
or TesseractAndLstm
(that is, legacy mode), then the font parameters are initialized and FontAttributes
will not be empty. Font options can be very useful for custom text rendering. It might be better to decorate the font information like this:
public class FontProperties
{
public int PointSize { get; }
public FontAttributes? FontAttributes { get; }
public FontProperties(int pointSize, FontAttributes? fontAttributes = null)
{
this.PointSize = pointSize;
this.FontAttributes = fontAttributes ;
}
}
public FontProperties FontProperties
{
get
{
var nameHandle =
TessApi.Native.ResultIteratorWordFontAttributes(
IteratorHandleRef,
out var isBold, out var isItalic, out var isUnderlined,
out var isMonospace, out var isSerif, out var isSmallCaps,
out var pointSize, out var fontId);
FontAttributes fontAttributes = null;
// This can happen in certain error conditions or legacy mode
if (nameHandle != IntPtr.Zero)
{
var fontName = MarshalHelper.PtrToString(nameHandle, Encoding.UTF8);
var fontInfo = new FontInfo(fontName, fontId, isItalic, isBold, isMonospace, isSerif);
fontAttributes = new FontAttributes(fontInfo, isUnderlined, isSmallCaps);
}
return FontProperties(pointSize, fontAttributes);
}
}
You are right, I missed that one
I liked your solution and implemented that one. See the latest nuget package.
Just curious ... for what are you using Tesseract OCR?
Wow, cool, thanks!
I'm using Tesseract to recognize PDF files without a text layer to create a searcheble PDF. Due to a variety of reasons (image preprocessing, saving the quality of the original PDF, and others) I can't use the PDF rendering tool that Tesseract provides.
In my app i need to know the point size of recognized word. In current Tesseract version (5.2)
Word.FontAttributes
is always null. It is so because this property is created from pointer which is not assigned inltrresultiterator.cpp
sinceDISABLED_LEGACY_ENGINE
is defined, but point size is calculated nevertheless and returned as out parameter.Is there any way to get calculated point size?