PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.47k stars 7.84k forks source link

Add better support for Brazilian Portuguese #13715

Open insinfo opened 3 months ago

insinfo commented 3 months ago

🔎 Search before asking

🐛 Bug (问题描述)

I did a test to OCR scanned documents in Brazilian Portuguese, and I saw that PaddleOCR makes a lot of mistakes on scanned documents in Portuguese, I used the C# implementation with the larger server models ch_PP-OCRv4_det_server_infer, ch_ppocr_mobile_v2.0_cls_infer, ch_PP-OCRv4_rec_server_infer

https://github.com/raoyutian/PaddleOCRSharp

example 1

572_page-0001

result:

ESTADO DO RIO DE JANEIRO Prefeitura Municipal de Rio das Ostras PROTOCOLO GERAL Prsees30: 25304 1 2003 Dete: 2811/2003 Hrs: 40:30:10 gterete YLTCt SANTOS DA SLVA unanisoOraseu rgt Dastre: P SGuRSs NSOPICAO COMO ABULAFE

the correct thing would be

ESTADO DO RIO DE JANEIRO
Prefeitura Municipal de Rio das Ostras
PROTOCOLO GERAL

Processo: 25304 / 2003
Data: 26/11/2003
Hora: 10:30:10
Requerente: AYLTON SANTOS DA SILVA
Sec. Destino: Sec. Mun. Urbanismo Obras e S. Pub.
Dep. Destino: 0
Assunto: INSCRIÇÃO COMO AMBULANTE

example 2

1-1

result:

09x0 ESTADO DO RIO DE JANEIRO
Prefeitura Municipal de Rio das Ostras PROTOCOLO GERAL CECtKO Prbeb6: 18457 1 2003 Data: 03/09/2003 Hora: 10:53:56 Requerente: COSCARELLlE CiA LTDA ME on S Dept.Desthno: Dept de Tributos e Fiscalzagao Y 3:140 Assunto: ALVARA

the correct thing would be

ESTADO DO RIO DE JANEIRO
Prefeitura Municipal de Rio das Ostras
PROTOCOLO GERAL

Processo: 18457 / 2003
Data: 03/09/2003
Hora: 10:53:56
Requerente: COSCARELLI E CIA LTDA ME
Sec. Destino: Secretaria Municipal de Fazenda
Dept. Destino: Depto. de Tributos e Fiscalização
Assunto: ALVARÁ

example 3

110-1

result:

ESTADO DO RIO DE JANEIRO Prefeitura Municipal de Rio das Ostras PROTOCOLO GERAL
-rocesg 15314 : 2003 Data 25/072003
1 Hora: 16:18:28 PeUSTente COLOHA DE PESCADORES ZO Se atie: ec itin uranisrne ctras Dtpl.Estre.: Assunto. AGRAdECiMENTO,FAL 598 3

the correct thing would be

ESTADO DO RIO DE JANEIRO
Prefeitura Municipal de Rio das Ostras
PROTOCOLO GERAL

Processo: 15314 / 2003
Data: 25/07/2003
Hora: 16:18:28

Requerente: COLÔNIA DE PESCADORES Z-22
Sec. Destino: Sec. Mun. Urbanismo Obras e S. Pub.
Dept. Destino: 0
Assunto: AGRADECIMENTO / FAZ

🏃‍♂️ Environment (运行环境)

Windows 11 .NET C#

🌰 Minimal Reproducible Example (最小可复现问题的Demo)


using PaddleOCRSharp;
using System;
using System.Data;
using System.Diagnostics;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.IO.MemoryMappedFiles;
using System.Linq;
using System.Security.Cryptography;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml.Linq;

namespace PaddleOCRSharpDemo
{
    /// <summary>
    /// PaddleOCRSharp使用示例
    /// </summary>
    public partial class MainForm : Form
    {

        private string[] bmpFilters = new string[] { ".bmp", ".jpg", ".jpeg", ".tiff", ".tif", ".png" };
        private string fileFilter = "*.*|*.bmp;*.jpg;*.jpeg;*.tiff;*.tiff;*.png";
        private string title = "PaddleOCR C# Demo Green Version by Rao Yutian QQ Group: 318860399, Custom Development QQ: 277784829";
        private PaddleOCREngine engine;
        private PaddleStructureEngine structengine;
        Bitmap bmp;
        OCRResult lastocrResult;
        string outpath = Path.Combine(Environment.CurrentDirectory, "out");
        DateTime dt1 = DateTime.Now;
        DateTime dt2 = DateTime.Now;
        public MainForm()
        {
            InitializeComponent();
            this.Text = title;
            imageView1.AllowDrop = true;

            //EngineBase.PaddleOCRdllName = Path.Combine(Environment.CurrentDirectory, "x64", "PaddleOCR.dll");

        }
        private void MainForm_Load(object sender, EventArgs e)
        {
            if (!Directory.Exists(outpath))
            { Directory.CreateDirectory(outpath); }

            //Comes with lightweight Chinese and English model V3 model
            //OCRModelConfig config = null;

            //Server Chinese and English Model
            OCRModelConfig config = new OCRModelConfig();
            string root = System.IO.Path.GetDirectoryName(typeof(OCRModelConfig).Assembly.Location);
            //string modelPathroot = root + @"\inferenceserver";
            var modelPathrootS = @"C:\MyCsharpProjects\PaddleOCRSharp\PaddleOCRDemo\PaddleOCRSharpDemo";
            config.det_infer = modelPathrootS + @"\ch_PP-OCRv4_det_server_infer";//@"\ch_ppocr_server_v2.0_det_infer";
            config.cls_infer = modelPathrootS + @"\ch_ppocr_mobile_v2.0_cls_infer";//@"\ch_ppocr_mobile_v2.0_cls_infer";
            config.rec_infer = modelPathrootS + @"\ch_PP-OCRv4_rec_server_infer";//@"\ch_ppocr_server_v2.0_rec_infer";
            // config.keys = modelPathroot + @"\ppocr_keys.txt";
            config.keys = modelPathrootS + @"\latin_dict.txt";

            //English and digital models V3
            //OCRModelConfig config = new OCRModelConfig();
            //string root = System.IO.Path.GetDirectoryName(typeof(OCRModelConfig).Assembly.Location);
            //string modelPathroot = root + @"\en_v3";
            //config.det_infer = modelPathroot + @"\en_PP-OCRv3_det_infer";
            //config.cls_infer = modelPathroot + @"\ch_ppocr_mobile_v2.0_cls_infer";
            //config.rec_infer = modelPathroot + @"\en_PP-OCRv3_rec_infer";
            //config.keys = modelPathroot + @"\en_dict.txt";

            //OCR parameters
            OCRParameter oCRParameter = new OCRParameter();
            oCRParameter.cpu_math_library_num_threads = 10;//Predicting the number of concurrent threads
            oCRParameter.enable_mkldnn = true;//For web deployment, it is recommended to set this value to 0, otherwise an error will occur. If the memory usage is large, it is recommended to set this value to 0 as well.
            oCRParameter.cls = false; //Whether to perform text direction classification; default false
            oCRParameter.det = true;//Whether to enable direction detection, used to detect and identify 180 rotation
            oCRParameter.use_angle_cls = false;//Whether to enable direction detection, used to detect and identify 180 rotation
            oCRParameter.det_db_score_mode = true;//Whether to use polylines, that is, whether the text area is a polyline or a rectangle.

            //Initialize the OCR engine
            engine = new PaddleOCREngine(config, oCRParameter);

              //模型配置,使用默认值
              StructureModelConfig structureModelConfig = null;
            //表格识别参数配置,使用默认值
            StructureParameter structureParameter = new StructureParameter();
            structengine =new PaddleStructureEngine(structureModelConfig, structureParameter);

        }
        private Bitmap GetClipboardImage()
        {
            bmp = (Bitmap)Clipboard.GetImage();
            if (bmp == null)
            {
                var files = Clipboard.GetFileDropList();

                string[] Filtersarr = new string[files.Count];
                files.CopyTo(Filtersarr, 0);
                Filtersarr = Filtersarr.Where(x => bmpFilters.Contains(Path.GetExtension(x).ToLower())).ToArray();
                if (Filtersarr.Length > 0)
                {
                    var imagebyte = File.ReadAllBytes(Filtersarr[0]);
                    bmp = new Bitmap(new MemoryStream(imagebyte));
                }
            }
            return bmp;
        }
        private void imageView1_DragDrop(object sender, DragEventArgs e)
        {
            var data = e.Data;
            if (data == null) return;
            string[] files = data.GetData(DataFormats.FileDrop, autoConvert: true) as string[];

            string[] Filtersarr = new string[files.Count()];
            files.CopyTo(Filtersarr, 0);
            Filtersarr = Filtersarr.Where(x => bmpFilters.Contains(Path.GetExtension(x).ToLower())).ToArray();
            if (Filtersarr.Length > 0)
            {
                var imagebyte = File.ReadAllBytes(Filtersarr[0]);
                bmp = new Bitmap(new MemoryStream(imagebyte));
                imageView1.Image = bmp;

                richTextBox1.Clear();
                richTextBox1.Show();
                dataGridView1.Hide();
                if (bmp == null) return;
                dt1 = DateTime.Now;
                OCRResult ocrResult = engine.DetectText(imagebyte);
                dt2 = DateTime.Now;
                ShowOCRResult(ocrResult);

            }
        }

        private void imageView1_DragEnter(object sender, DragEventArgs e)
        {
            e.Effect = DragDropEffects.Move;
        }

        private void imageView1_DragOver(object sender, DragEventArgs e)
        {
            e.Effect = DragDropEffects.Move;
        }
        /// <summary>
        /// 打开本地图片
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void toolStripopenFile_Click(object sender, EventArgs e)
        {
            OpenFileDialog ofd = new OpenFileDialog();
            ofd.Filter = fileFilter;
            if (ofd.ShowDialog() != DialogResult.OK) return;
            var imagebyte = File.ReadAllBytes(ofd.FileName);
            bmp = new Bitmap(new MemoryStream(imagebyte));
            imageView1.Image = bmp;

            richTextBox1.Clear();
            richTextBox1.Show();
            dataGridView1.Hide();
            if (bmp == null) return;

            dt1 = DateTime.Now;
            OCRResult ocrResult = engine.DetectText(imagebyte);
            dt2 = DateTime.Now;
            ShowOCRResult(ocrResult);

        }

        /// <summary>
        /// 识别截图文本
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void toolStripLabel2_Click(object sender, EventArgs e)
        {
            this.Hide();
            System.Threading.Thread.Sleep(200);
            Application.DoEvents();
            richTextBox1.Clear();
            richTextBox1.Show();
            dataGridView1.Hide();
            ScreenCapturer.ScreenCapturerTool screenCapturer = new ScreenCapturer.ScreenCapturerTool();
            if (screenCapturer.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                bmp = (Bitmap)screenCapturer.Image;
                imageView1.Image = bmp;
                try
                {
                    dt1 = DateTime.Now;
                    OCRResult ocrResult = engine.DetectText(bmp);
                    dt2 = DateTime.Now;
                    ShowOCRResult(ocrResult);
                }
                catch (Exception ex)
                {
                }

            }
            this.Show();
        }
        /// <summary>
        /// 剪切板识别
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>       
        private void toolStripsnapocr_Click(object sender, EventArgs e)
        {
            bmp = GetClipboardImage();

            imageView1.Image = bmp;

                try
            {
                dt1 = DateTime.Now;
                OCRResult ocrResult = engine.DetectText(bmp);
                dt2 = DateTime.Now;
                ShowOCRResult(ocrResult);

                }
                catch (Exception ex)
                {
                }

        }
        /// <summary>
        /// 本地文件表格
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void toolStripLabel4_Click(object sender, EventArgs e)
        {
            OpenFileDialog ofd = new OpenFileDialog();
            ofd.Filter = fileFilter;
            if (ofd.ShowDialog() != DialogResult.OK) return;
            var imagebyte = File.ReadAllBytes(ofd.FileName);
            bmp = new Bitmap(new MemoryStream(imagebyte));
            imageView1.Image = bmp;
            if (bmp == null) return;
            string  result = structengine.StructureDetect(bmp);
            ShowOCRResult(result, Path.GetFileNameWithoutExtension(ofd.FileName));
        }
        /// <summary>
        /// 识别截图表格
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void toolStripLabel3_Click(object sender, EventArgs e)
        {
            this.Hide();

            System.Threading.Thread.Sleep(200);
            Application.DoEvents();

            ScreenCapturer.ScreenCapturerTool screenCapturer = new ScreenCapturer.ScreenCapturerTool();
            if (screenCapturer.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                bmp = (Bitmap)screenCapturer.Image;
                imageView1.Image = bmp;
                string result = structengine.StructureDetect(bmp);
                ShowOCRResult(result, Path.GetRandomFileName());
                this.Show();
            }
        }
        /// <summary>
        /// 识别剪切板表格
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void toolStripsnaptable_Click(object sender, EventArgs e)
        {
            bmp = GetClipboardImage();

            imageView1.Image = bmp;
            if (bmp == null) return;
            string result = structengine.StructureDetect(bmp);
            ShowOCRResult(result, Path.GetRandomFileName());
        }
        /// <summary>
        /// 显示结果
        /// </summary>
        private void ShowOCRResult(OCRResult ocrResult)
        {
            lastocrResult = ocrResult;
            richTextBox1.Clear();
            Bitmap bitmap = (Bitmap)this.imageView1.Image;

            if (toolStripComboBox1.Text == "简单显示")
            {
                foreach (var item in ocrResult.TextBlocks)
                {
                    richTextBox1.AppendText(item.Text + "\n");

                }
            }
            else if (toolStripComboBox1.Text == "详细显示")
            {
                foreach (var item in ocrResult.TextBlocks)
                {
                    richTextBox1.AppendText(item.ToString() + "\n");
                }
            }
            Bitmap bmp = new Bitmap(bitmap.Width, bitmap.Height);
            using (Graphics g = Graphics.FromImage(bmp))
            {
                g.DrawImage(bitmap, 0, 0);
                foreach (var item in ocrResult.TextBlocks)
                {
                    g.DrawPolygon(new Pen(Brushes.Red, 2), item.BoxPoints.Select(x => new PointF() { X = x.X, Y = x.Y }).ToArray());
                }
            }
            richTextBox1.AppendText("-----------------------------------\n");
            richTextBox1.AppendText($"耗时:{(dt2 - dt1).TotalMilliseconds}ms\n");
            imageView1.Image = bmp;
        }
        /// <summary>
        /// 显示表格结果
        /// </summary>
        private void ShowOCRResult(string result,string name)
        {
            string css = "<style>table{ border-spacing: 0pt;} td { border: 1px solid black;}</style>";
            result = result.Replace("<html>", "<html>" + css);
            string savefile = $"{Environment.CurrentDirectory}\\out\\{name}.html";
            File.WriteAllText(savefile, result);
            //打开网页查看效果
            Process.Start("explorer.exe", savefile);
        }
        private void toolStripComboBox1_SelectedIndexChanged(object sender, EventArgs e)
        {
            if (lastocrResult != null) ShowOCRResult(lastocrResult);
        } 
    }
}
insinfo commented 3 months ago

perhaps training on this dataset could help improve accuracy https://zenodo.org/records/7872951

UserWangZz commented 3 months ago

I think PP-OCRv4 is still good for clear text on the image, the text below the example image is a bit blurry, which may be the reason for the poor recognition.

github-actions[bot] commented 2 days ago

This issue is stale because it has been open for 90 days with no activity.