Mathpix / api-examples

Mathpix API examples - converts image to latex
209 stars 38 forks source link

Immersive Translate Feedback Zone #11

Open TaoLoading opened 3 months ago

TaoLoading commented 3 months ago

Dear Mathpix Support Team,

I hope this message is helpful to you. I am a member of the Immersive Translate team, and we have been utilizing Mathpix for our translation projects with great enthusiasm. We deeply appreciate the innovative solutions your product offers, which have significantly enhanced our workflow. At the same time, we also encountered some problems when using Mathpix, I will explain them separately in this issue, hoping to get your help, thanks!

TaoLoading commented 3 months ago

Description

There is a problem in recognizing vertically arranged Japanese documents. Here are the details: Text Alignment: Japanese text is arranged vertically from top to bottom and right to left. Issue Observed: The recognition results show missing characters, incorrect characters, and some characters that are not recognized at all.

Attachments

https://drive.google.com/file/d/1Z1LcEuuuqGOyTgyjckdvzN_DSd0JFsmo/view?usp=sharing

TaoLoading commented 3 months ago

Description

There are also problems with the following academic papers: Authors Section: The recognition of author names is often mixed up or incorrect. Abstract Section: The recognition of the abstract text is not very accurate, with some parts missing or incorrect.

Attachments

https://drive.google.com/file/d/1UdWnnq7lWf1nfOzxnzNaYTOTBI5c94pH/view

ykolodnitskiy commented 3 months ago

Hi @TaoLoading. Thank you for your feedback. Please send me your email at yaroslav@mathpix.com. We want to create a dedicated Slack channel with you for more efficient communication.

The text-to-page ratio of the PDF with vertical Japanese text is roughly 20-30% text and 70-80% white space. For better OCR accuracy, it's important to have text cover most of the page, ideally around 80%, similar to a standard PDF page. But our team will do additional tests on the recognition of vertical Japanese text.

I requested the access to the 2nd PDF file.

TaoLoading commented 2 months ago

Description

This is a scanned version of a Urdu language pdf file, and it seems that the text has not been effectively recognized.

Attachments

https://drive.google.com/file/d/1U4dt3zDexSdL0FQlZaiNLjegx6XjLj83/view?usp=sharing

TaoLoading commented 2 months ago

Description

The table part of this PDF file will have missing content after being recognized.

Attachments

https://drive.google.com/file/d/1SYbNIc4IeoYD-b7PyCJGHCmurDmj_b-W/view?usp=sharing

TaoLoading commented 2 months ago

Description

This is a screenshot of a PDF, there is a recognition issue with the vertically arranged text

Attachments

https://drive.google.com/file/d/1w8_-SZx6GI7nSoaDIqKcbv-R3pFwhofp/view?usp=sharing

TaoLoading commented 1 month ago

Description

The content in the box is incorrectly identified in this PDF

Attachments

https://drive.google.com/file/d/1iS1J7J_k8fl8mVRgFIcbe_yZuqppil7F/view?usp=sharing

TaoLoading commented 1 month ago

Description

There are some problems in recognizing this pdf:

  1. In the original text, the characters "/ ****" are recognized partly as images and partly as text. Additionally, two sentences are also identified as images.
  2. Some content is returned in markdown source code format.

Attachments

https://drive.google.com/file/d/1rudypXm1geAwRcW59X-3v1syrLOCMbL4/view?usp=sharing