Documentations for new users

sidazhou commented 2 years ago

I'm a new user, I spent some time on these, which would be nice in the README:

1) Selecting language in tesseract

osd (default selected by obsidian-ocr plugin) Does it detect english by default? Or do I need to select eng?
lang vs script? For example chi_sim vs script/Hans (answer: https://github.com/tesseract-ocr/tessdata_fast)

2) What happens when changing tesseract lang in settings?

If I change language, is everything reindexed in the new chosen lang? Or only new notes?
If I want to manually reindex everything, then it seems I should delete all ocr related json and restart obsidian?

3) Deleted images not removed from search

sidazhou commented 2 years ago

notes on Chinese language
- Using script/Hans, tesseract outputs extra spaces (see https://github.com/tesseract-ocr/tesseract/issues/991)
- This made me think obsidian-ocr is broken, but the reality is that tesseract is misconfigured.
- While we are here, can you add "addtional tesseract args" for fine tuning tesseract?

MohrJonas commented 2 years ago

Yes, yes and yes 😊 All these things are very useful and will be added 👍

MohrJonas commented 2 years ago

Done as of commit c02f4a3 Let me know if you think something else is missing from the README.

sidazhou commented 2 years ago

👍👍👍

Personally, I would add "lang vs script" explanation, because it was really hard to find for me

MohrJonas / obsidian-ocr