Recognize tables from images and restore them into word/crnn single character coordinates
从图像还原表格并以word形式保存/crnn单字坐标提取(解决文字跨表格单元格识别)
Final .docx file in word directory
Maybe one of the best open weights of document text detection and recognition.
也许是最好的开源印刷体文档文本检测和识别权值之一
Google drive link
PSE weight
CRNN weight
Unet weight
pkl for edit distance
1. python server.py
Load the unet model to extract table lines from the input image
2. python test.py
Feed the input image
(Table line detection model is not very robust, but I will reserve the related files maybe I will update it later.
开源的表格线检测模型泛化能力不够强,暂时搁置。保留之前的代码和模型,仅作参考)
~~Step 1 & 2 are not necessary if you have quite neat PDF images, meanwhile this project can't deal with some complex samples like tortuous and colorful receipts, I am still working on it.~~
I am handling complex table recognition, struggling with dataset. ~~Optimistically, there could be a radical change in weeks. If you are researching page layout and table recognition, please contact me.~~lizongxi1995@gmail.com