MohrJonas / obsidian-ocr

Obsidian OCR allows you to search for text in your images and pdfs
GNU General Public License v3.0
281 stars 5 forks source link

Documentations for new users #21

Closed sidazhou closed 2 years ago

sidazhou commented 2 years ago

I'm a new user, I spent some time on these, which would be nice in the README:

1) Selecting language in tesseract

2) What happens when changing tesseract lang in settings?

3) Deleted images not removed from search

sidazhou commented 2 years ago
  1. notes on Chinese language
    • Using script/Hans, tesseract outputs extra spaces (see https://github.com/tesseract-ocr/tesseract/issues/991)
    • This made me think obsidian-ocr is broken, but the reality is that tesseract is misconfigured.
    • While we are here, can you add "addtional tesseract args" for fine tuning tesseract?
MohrJonas commented 2 years ago

Yes, yes and yes 😊 All these things are very useful and will be added 👍

MohrJonas commented 2 years ago

Done as of commit c02f4a3 Let me know if you think something else is missing from the README.

sidazhou commented 2 years ago

👍👍👍

Personally, I would add "lang vs script" explanation, because it was really hard to find for me

Difference between lang vs script: https://github.com/tesseract-ocr/tessdata_fast