TenderOwl / Frog

Extract text from any image, video, QR Code and etc.
https://getfrog.app/
MIT License
568 stars 39 forks source link

Flatpak Frog downloads from tessdata instead of tessdata_best #82

Closed Albnu14 closed 1 year ago

Albnu14 commented 1 year ago

Hi, I'm trying to ocr an arabic image, but frog couldn't read anything from the image, when i replaced the ara.tessdata from the frog flatpak directory with the one i downloaded from tessdata_best, frog managed to ocr tge image with great success.

Even using tessdata_fast had a very good success.

Please make tessdata_best the default

amka commented 1 year ago

As you can see in the code we're trying to get the _best model by default and if an error happens we switch to the base model.

    def download_begin(self, code):
        tessfile = f'{code}.traineddata'
        tessfile_path = os.path.join(tessdata_dir, tessfile)
        print(f'Data will be extracted to: {tessfile_path}')
        try:
            request.urlretrieve(tessdata_best_url + tessfile, tessfile_path)
            return code
        except Exception as e:
            print(e)
            try:
                print(f"{code} not found in tessdata_best, checking tessdata")
                request.urlretrieve(tessdata_url + tessfile, tessfile_path)
                return code
            except Exception as e2:
                print(e2)
                print(f"{code} was not found at tessdata")
Albnu14 commented 1 year ago

Great, i will close the issue then.