SLAM-Handbook-contributors / slam-handbook-public-release

Release repo for our SLAM Handbook
1.9k stars 79 forks source link

Hidden letters make OCR difficult #10

Open changh95 opened 1 week ago

changh95 commented 1 week ago

Before I begin, thank you for a great book! :+1: Really looking forward to see the later chapters.


Describe the bug

There are many hidden letters in the PDF file.

To Reproduce Just one example

Steps to reproduce the behavior:

  1. Go to page 6
  2. Click and drag into the images
  3. Hidden letters appear

image

Additional context Since the letters are not visible, it won't matter to normal readers. But searching into PDF won't work well on PDF reader software. Indexing will go bad too.

Also, this bug makes OCR process difficult.

I'm using OCR to extract text, which I then use LLM to make a first-draft translation to other languages.

By the way, I am happy to share the translation to the authors. Would really love to see this book published in many languages, and happy to contribute!

image

ayoungk commented 6 days ago

We plan to go over all the figures later. Thanks for letting us know.

changh95 commented 5 days ago

I ceased the effort for unofficial translation as discussed here.

Leaving this comment for anyone working on unofficial translation at the moment!