Open TaoLoading opened 3 months ago
Description
There is a problem in recognizing vertically arranged Japanese documents. Here are the details: Text Alignment: Japanese text is arranged vertically from top to bottom and right to left. Issue Observed: The recognition results show missing characters, incorrect characters, and some characters that are not recognized at all.
Attachments
https://drive.google.com/file/d/1Z1LcEuuuqGOyTgyjckdvzN_DSd0JFsmo/view?usp=sharing
Description
There are also problems with the following academic papers: Authors Section: The recognition of author names is often mixed up or incorrect. Abstract Section: The recognition of the abstract text is not very accurate, with some parts missing or incorrect.
Attachments
https://drive.google.com/file/d/1UdWnnq7lWf1nfOzxnzNaYTOTBI5c94pH/view
Hi @TaoLoading. Thank you for your feedback. Please send me your email at yaroslav@mathpix.com. We want to create a dedicated Slack channel with you for more efficient communication.
The text-to-page ratio of the PDF with vertical Japanese text is roughly 20-30% text and 70-80% white space. For better OCR accuracy, it's important to have text cover most of the page, ideally around 80%, similar to a standard PDF page. But our team will do additional tests on the recognition of vertical Japanese text.
I requested the access to the 2nd PDF file.
Description
This is a scanned version of a Urdu language pdf file, and it seems that the text has not been effectively recognized.
Attachments
https://drive.google.com/file/d/1U4dt3zDexSdL0FQlZaiNLjegx6XjLj83/view?usp=sharing
Description
The table part of this PDF file will have missing content after being recognized.
Attachments
https://drive.google.com/file/d/1SYbNIc4IeoYD-b7PyCJGHCmurDmj_b-W/view?usp=sharing
Description
This is a screenshot of a PDF, there is a recognition issue with the vertically arranged text
Attachments
https://drive.google.com/file/d/1w8_-SZx6GI7nSoaDIqKcbv-R3pFwhofp/view?usp=sharing
Description
The content in the box is incorrectly identified in this PDF
Attachments
https://drive.google.com/file/d/1iS1J7J_k8fl8mVRgFIcbe_yZuqppil7F/view?usp=sharing
Description
There are some problems in recognizing this pdf:
Attachments
https://drive.google.com/file/d/1rudypXm1geAwRcW59X-3v1syrLOCMbL4/view?usp=sharing
Dear Mathpix Support Team,
I hope this message is helpful to you. I am a member of the Immersive Translate team, and we have been utilizing Mathpix for our translation projects with great enthusiasm. We deeply appreciate the innovative solutions your product offers, which have significantly enhanced our workflow. At the same time, we also encountered some problems when using Mathpix, I will explain them separately in this issue, hoping to get your help, thanks!