MohrJonas / obsidian-ocr

Obsidian OCR allows you to search for text in your images and pdfs
GNU General Public License v3.0
281 stars 5 forks source link

File types changed and will not open and search not finding anything #36

Open pickleton89 opened 1 year ago

pickleton89 commented 1 year ago

A number of my png files in my attachment folders have .hocr added to name of file. They will not open now. Additionally, when I invoke the OCR search window, it doesn't find anything. I see when opening Obsidian that the indexing counter finishes.

MohrJonas commented 1 year ago

Thank you for your issue. Can you confirm that you're using the newest version of obsidian-ocr, which is currently 2.0.0?

pickleton89 commented 1 year ago

Yes. Everything thing is up to date. Also my obsidian is Version 1.1.12 (Installer 1.1.9)

On Feb 4, 2023, at 4:12 PM, Jonas Mohr @.***> wrote:

Thank you for your issue. Can you confirm that you're using the newest version of obsidian-ocr, which is currently 2.0.0?

— Reply to this email directly, view it on GitHub https://github.com/MohrJonas/obsidian-ocr/issues/36#issuecomment-1416873205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEF5454SAKTL2RGINXIGJKTWV3O65ANCNFSM6AAAAAAURHPJ34. You are receiving this because you authored the thread.

MohrJonas commented 1 year ago

Alright, thanks for checking. When you say the files have hocr added to the name, do you mean you have a file x.png and also a file x.png.hocr?

pickleton89 commented 1 year ago

I sorted the files alphabetically and see that there is the base .png files and then another file, with the same name with the x.phg.hocr. I hadn’t noticed that before.

Jeff

On Feb 5, 2023, at 11:17 AM, Jonas Mohr @.***> wrote:

Alright, thanks for checking. When you say the files have hocr added to the name, do you mean you have a file x.png and also a file x.png.hocr?

— Reply to this email directly, view it on GitHub https://github.com/MohrJonas/obsidian-ocr/issues/36#issuecomment-1418206091, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEF5454MB673SK3LLGN2TQDWV7VFPANCNFSM6AAAAAAURHPJ34. You are receiving this because you authored the thread.

MohrJonas commented 1 year ago

Alright, it's good to hear that the original file is still there, was a bit scared that I screwed something up, and it deleted the files 😌 The .hocr files are remnants of an older version of obsidian-ocr that stored the OCR information for x.png in x.png.hocr. You can either leave them there and ignore them, or simply delete them. Since version 2.0.0, the information is stored in a SQLite database, called .obsidian-ocr.sqlite, in the root of your vault.

Concerning the problem you described above: Could you please open the developer console and see if any errors are reported?

pickleton89 commented 1 year ago

I opened the OCR search command and typed in a query and got this error in console after pressing return on the search.

Screenshot of Obsidian (2-5-23, 1-50-49 PM)



On Feb 5, 2023, at 1:49 PM, Jonas Mohr @.***> wrote:

Alright, it's good to hear that the original file is still there, was a bit scared that I screwed something up, and it deleted the files 😌 The .hocr files are remnants of an older version of obsidian-ocr that stored the OCR information for x.png in x.png.hocr. You can either leave them there and ignore them, or simply delete them. Since version 2.0.0, the information is stored in a SQLite database, called .obsidian-ocr.sqlite, in the root of your vault.

Concerning the problem you described above: Could you please open the developer console and see if any errors are reported?

— Reply to this email directly, view it on GitHub https://github.com/MohrJonas/obsidian-ocr/issues/36#issuecomment-1418260986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEF5455W4RIWOITP35BB74TWWAG4XANCNFSM6AAAAAAURHPJ34. You are receiving this because you authored the thread.

MohrJonas commented 1 year ago

Unfortunately, I can't seem to see the attached image

pickleton89 commented 1 year ago
Screenshot of Obsidian (2-5-23, 1-50-49 PM)
pickleton89 commented 1 year ago

Sorry about that was replying by email and image didn't come through. I posted it above.

MohrJonas commented 1 year ago

Alright, thanks for the image. Could you please enable Log to file in the settings of obsidian-ocr, restart obsidian and perform the same steps you did to produce the error above. After that, could you please attach the log file.

pickleton89 commented 1 year ago

Sorry, but I a not sure were the the log file gets created and how to find it.

On Feb 5, 2023, at 5:49 PM, Jonas Mohr @.***> wrote:

Alright, thanks for the image. Could you please enable Log to file in the settings of obsidian-ocr, restart obsidian and perform the same steps you did to produce the error above. After that, could you please attach the log file.

— Reply to this email directly, view it on GitHub https://github.com/MohrJonas/obsidian-ocr/issues/36#issuecomment-1418330433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEF5456NV5LUK24YDZW5ICLWWBDAFANCNFSM6AAAAAAURHPJ34. You are receiving this because you authored the thread.

pickleton89 commented 1 year ago

I found the file. obsidian-ocr.log

MohrJonas commented 1 year ago

Thank you for the log. I think I have somewhat of an idea what's going on here. Could you please tell me which os you're using?

pickleton89 commented 1 year ago

I am currently running macOS Ventura 13.2

MohrJonas commented 1 year ago

Okay, and could you tell me the output of tesseract -v in your terminal?

pickleton89 commented 1 year ago

tesseract 5.2.0 leptonica-1.82.0 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.3) : libpng 1.6.39 : libtiff 4.4.0 : zlib 1.2.11 : libwebp 1.2.4 : libopenjp2 2.5.0 Found NEON Found libarchive 3.6.2 zlib/1.2.11 liblzma/5.2.9 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.2 Found libcurl/7.86.0 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 nghttp2/1.47.0

MohrJonas commented 1 year ago

Alright, after looking at the logs I know why it doesn't work, but I don't know why it doesn't work. This is the code responsible for running tesseract:

...
const execReturn = exec(`tesseract ${this.settings.additionalArguments} "${source}" stdout -l ${this.settings.lang} hocr`);
...

This would (as an example) translate into a command like that: tesseract "/some/file.png" stdout -l eng

The problem is (as can be seen in the log), that tesseract for some reason misinterprets the command, giving the following error message:

read_params_file: Can't open stdout
read_params_file: Can't open -l
read_params_file: Can't open eng
Error, cannot read input file undefined: No such file or directory
Error during processing.

As can be seen in the error message, tesseract tries to open stdout, -l and eng as files, even though they are just command line arguments, which is quite strange. I can only assume, that there is some sort of problem with the way the file path is handed to tesseract, because when I input the bogus command tesseract some/file path/image.png stdout -l eng, I get a similar error:

read_params_file: Can't open stdout
read_params_file: Can't open -l
read_params_file: Can't open eng

On the other hand, the file path is wrapped with "", which cause tesseract to behave properly again. Therefore, some more investigation is necessary so sit tight 😊

pickleton89 commented 1 year ago

I ran the path checks as listed below: (base) [~]$ brew list tesseract /opt/homebrew/Cellar/tesseract/5.2.0/bin/tesseract /opt/homebrew/Cellar/tesseract/5.2.0/include/tesseract/ (12 files) /opt/homebrew/Cellar/tesseract/5.2.0/lib/libtesseract.5.dylib /opt/homebrew/Cellar/tesseract/5.2.0/lib/pkgconfig/tesseract.pc /opt/homebrew/Cellar/tesseract/5.2.0/lib/ (2 other files) /opt/homebrew/Cellar/tesseract/5.2.0/share/tessdata/ (35 files) (base) [~]$ brew list tesseract-lang /opt/homebrew/Cellar/tesseract-lang/4.1.0/share/tessdata/ (162 files) (base) [~]$ brew list imagemagick /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/Magick++-config /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/MagickCore-config /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/MagickWand-config /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/animate /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/compare /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/composite /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/conjure /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/convert /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/display /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/identify /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/import /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/magick /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/magick-script /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/mogrify /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/montage /opt/homebrew/Cellar/imagemagick/7.1.0-54/bin/stream /opt/homebrew/Cellar/imagemagick/7.1.0-54/etc/ImageMagick-7/ (13 files) /opt/homebrew/Cellar/imagemagick/7.1.0-54/include/ImageMagick-7/ (137 files) /opt/homebrew/Cellar/imagemagick/7.1.0-54/lib/libMagick++-7.Q16HDRI.5.dylib /opt/homebrew/Cellar/imagemagick/7.1.0-54/lib/libMagickCore-7.Q16HDRI.10.dylib /opt/homebrew/Cellar/imagemagick/7.1.0-54/lib/libMagickWand-7.Q16HDRI.10.dylib /opt/homebrew/Cellar/imagemagick/7.1.0-54/lib/ImageMagick/ (261 files) /opt/homebrew/Cellar/imagemagick/7.1.0-54/lib/pkgconfig/ (8 files) /opt/homebrew/Cellar/imagemagick/7.1.0-54/lib/ (9 other files) /opt/homebrew/Cellar/imagemagick/7.1.0-54/share/ImageMagick-7/ (3 files) /opt/homebrew/Cellar/imagemagick/7.1.0-54/share/doc/ (332 files) /opt/homebrew/Cellar/imagemagick/7.1.0-54/share/man/ (17 files)