Closed zhangqi-ulua closed 2 years ago
All old Project use System.Drawing.Bitmap to load and edit image files, so it might be better to support like https://github.com/charlesw/tesseract/blob/9df02d55744cebdd504dd27c65a02d20e6d07a5a/src/Tesseract/TesseractDrawingExtensions.cs
I'm not going to support the BitMap object because that is a .NET4.x Windows thing but if you need it you can always save your bitmap to a memorystream and then load it in the Pix object with this method
#region LoadFromMemory
/// <summary>
/// Loads an image from a MemoryStream
/// </summary>
/// <param name="memoryStream">The memory stream</param>
/// <returns><see cref="Image"/></returns>
/// <exception cref="IOException"></exception>
public static Image LoadFromMemory(MemoryStream memoryStream)
{
Logger.LogInformation("Loading image from memory stream");
var buffer = memoryStream.GetBuffer();
return LoadFromMemoryInternal(buffer, 0, buffer.Length);
}
Just do this
var ms = new MemoryStream();
bitmap.Save(ms, System.Drawing.Imaging.ImageFormat.BMP);
TesseractOCR.Pix.Image.LoadFromMemory(ms.GetBuffer(), 0, ms.Length);
I'm not going to support the BitMap object because that is a .NET4.x Windows thing but if you need it you can always save your bitmap to a memorystream and then load it in the Pix object with this method
#region LoadFromMemory /// <summary> /// Loads an image from a MemoryStream /// </summary> /// <param name="memoryStream">The memory stream</param> /// <returns><see cref="Image"/></returns> /// <exception cref="IOException"></exception> public static Image LoadFromMemory(MemoryStream memoryStream) { Logger.LogInformation("Loading image from memory stream"); var buffer = memoryStream.GetBuffer(); return LoadFromMemoryInternal(buffer, 0, buffer.Length); }
Thanks for your reply. But, Bitmap also exist in .NET 6.0 and newest 7.0 , My Projects use WinForm and WPF , just support Windows
Yeah but even Microsoft does not recommend it to use it anymore that is why I dropped support for that.
Yeah but even Microsoft does not recommend it to use it anymore that is why I dropped support for that.
Ok,Thanks.
But you should be fine to just use this ... If you don't OCR crazy amounts of data you won't notice any speed difference
var ms = new MemoryStream();
bitmap.Save(ms, System.Drawing.Imaging.ImageFormat.BMP);
TesseractOCR.Pix.Image.LoadFromMemory(ms.GetBuffer(), 0, ms.Length);
I know this way, but it's waste some memory space to store and CPU time. If the Engine.Process can support read Bitmap and do it ,it would be better
I know this way, but it's waste some memory space to store and CPU time. If the Engine.Process can support read Bitmap and do it ,it would be better
Converting a Bitmap to Pix also takes times so I don't think it makes much difference
I know this way, but it's waste some memory space to store and CPU time. If the Engine.Process can support read Bitmap and do it ,it would be better
Converting a Bitmap to Pix also takes times so I don't think it makes much difference
No, I means support Engine.process(Bitmap, string, Mode), not translate Bitmap to Tesseract.Pix
The Bitmap always needs to be translated to Pix because Tesseract does not understand Bitmap. What Charles his extension does is add an overload method that accepts a Bitmap but translated that in the background to Pix before feeding it to Tesseract
In the background this method is called to process the page and that one only wants a Pix object
/// <summary>
/// The recognized text is returned as a char* which is coded as UTF-8 and must be freed with the delete [] operator.
///
/// <summary>
/// The recognized text is returned as a char* which is coded as UTF-8 and must be freed with the delete [] operator.
/// </summary>
/// <param name="handle">The TesseractAPI instance</param>
/// <param name="pix"></param>
/// <param name="page_index"></param>
/// <param name="filename"></param>
/// <param name="retry_config"></param>
/// <param name="timeout_millisec"></param>
/// <param name="renderer"></param>
/// <returns></returns>
[RuntimeDllImport(Constants.TesseractDllName, CallingConvention = CallingConvention.Cdecl, EntryPoint = "TessBaseAPIProcessPage")]
int BaseApiProcessPage(HandleRef handle, Pix.Image pix, int page_index, string filename, string retry_config, int timeout_millisec, HandleRef renderer);</summary>
In the background this method is called to process the page and that one only wants a Pix object
/// <summary> /// The recognized text is returned as a char* which is coded as UTF-8 and must be freed with the delete [] operator. /// /// <summary> /// The recognized text is returned as a char* which is coded as UTF-8 and must be freed with the delete [] operator. /// </summary> /// <param name="handle">The TesseractAPI instance</param> /// <param name="pix"></param> /// <param name="page_index"></param> /// <param name="filename"></param> /// <param name="retry_config"></param> /// <param name="timeout_millisec"></param> /// <param name="renderer"></param> /// <returns></returns> [RuntimeDllImport(Constants.TesseractDllName, CallingConvention = CallingConvention.Cdecl, EntryPoint = "TessBaseAPIProcessPage")]
Thank you very much
https://github.com/charlesw/tesseract/blob/9df02d55744cebdd504dd27c65a02d20e6d07a5a/src/Tesseract/PixConverter.cs