faustomorales / keras-ocr

A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.
https://keras-ocr.readthedocs.io/
MIT License
1.38k stars 355 forks source link

Move file storage to more stable location (e.g., GitHub Releases) #117

Closed vivekslair closed 3 years ago

vivekslair commented 4 years ago

I am using Colab for my experimentation and have executed the following :-

(1) !pip install git+https://github.com/faustomorales/keras-ocr.git
(2) import matplotlib.pyplot as plt import keras_ocr

However when I execute the next line for initializing pipeline

pipeline = keras_ocr.pipeline.Pipeline()

I am getting the following error :-

Looking for /root/.keras-ocr/craft_mlt_25k.h5 Downloading /root/.keras-ocr/craft_mlt_25k.h5

AssertionError Traceback (most recent call last)

in () 1 keras-ocr will automatically download pretrained 2 weights for the detector and recognizer. ----> 3 pipeline = keras_ocr.pipeline.Pipeline() 2 frames /usr/local/lib/python3.6/dist-packages/keras_ocr/tools.py in download_and_verify(url, sha256, cache_dir, verbose, filename) 452 print('Downloading ' + filepath) 453 urllib.request.urlretrieve(url, filepath) --> 454 assert sha256 is None or sha256 == sha256sum(filepath), 'Error occurred verifying sha256.' 455 return filepath 456 AssertionError: Error occurred verifying sha256.
faustomorales commented 4 years ago

Hi! You are likely seeing this error downloading weights today because of this outage which affected MediaFire (where the weights are hosted). Please try again soon and I believe it should work.

dishantsonawane23 commented 4 years ago

All the IP outrage has been resolved, but the error still exist.

cezar-lima commented 4 years ago

I think it is related to the host limitation. See: https://mediafire.zendesk.com/hc/en-us/articles/207100597-Bandwidth-and-Direct-Downloading. When I reproduce the steps above the downloaded file is an HTML file and if I visit the URL: https://github.com/faustomorales/keras-ocr/blob/6b1125194a6242608f8bcc5edfe4152036e96d7b/keras_ocr/detection.py#L575 I am directed to a download page. So, I think the solution is downloading the file yourself into required dir.

GhadaJouini commented 4 years ago

@cezar-lima did u try this solution ? is it working fine ?

cezar-lima commented 4 years ago

@GhadaJouini Yes, it worked. I have downloaded both files and placed on ~/.keras-ocr dir. https://github.com/faustomorales/keras-ocr/blob/6b1125194a6242608f8bcc5edfe4152036e96d7b/keras_ocr/detection.py#L575 https://github.com/faustomorales/keras-ocr/blob/9da08304a1130c176bf1e41a462f24f42b82fe6e/keras_ocr/recognition.py#L37

null0nil commented 4 years ago
old_url="https://www.mediafire.com/file/mepzf3sq7u7nve9/craft_mlt_25k.h5/file"
new_url="https://download906.mediafire.com/koq5tsgbhing/mepzf3sq7u7nve9/craft_mlt_25k.h5"
source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/detection.py"
sed --in-place "s|${old_url}|${new_url}|g" $source_file

old_url="https://www.mediafire.com/file/pkj2p29b1f6fpil/crnn_kurapan.h5/file"
new_url="https://download2260.mediafire.com/726aptv1tqqg/pkj2p29b1f6fpil/crnn_kurapan.h5"
source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py"
sed --in-place "s|${old_url}|${new_url}|g" $source_file
vivekslair commented 4 years ago

@cezar-lima Did you try in local or in Colab ?

vivekslair commented 4 years ago
old_url="https://www.mediafire.com/file/mepzf3sq7u7nve9/craft_mlt_25k.h5/file"
new_url="https://download906.mediafire.com/koq5tsgbhing/mepzf3sq7u7nve9/craft_mlt_25k.h5"
source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/detection.py"
sed --in-place "s|${old_url}|${new_url}|g" $source_file

old_url="https://www.mediafire.com/file/pkj2p29b1f6fpil/crnn_kurapan.h5/file"
new_url="https://download2260.mediafire.com/726aptv1tqqg/pkj2p29b1f6fpil/crnn_kurapan.h5"
source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py"
sed --in-place "s|${old_url}|${new_url}|g" $source_file

@null0nil - I tried this option , however when i do a grep url in either of the python scripts I can see both the urls references and the sha256 error message still persists

GhadaJouini commented 4 years ago

@cezar-lima I tried ur solution in google colab and placed the downloaded files in /keras-ocr but still not working, did you change anything in the files detection.py and recognition.py in the urls thanks.

isabella-karabasz commented 4 years ago

@GhadaJouini, did you leave them in the root folder? Try the following:

!mv 'path/to/crnn_kurapan_notop.h5' '/root/.keras-ocr/crnn_kurapan_notop.h5'

GhadaJouini commented 4 years ago

@isabella-karabasz since I'm working on google colab I placed them on this path: '/usr/local/lib/python3.6/dist-packages/keras_ocr/'

isabella-karabasz commented 4 years ago

I am working in colab as well, and it worked for me for both the detector and the recognizer. This the path where the logs state that it is looking for the files when the error occurs:

`

Looking for /root/.keras-ocr/crnn_kurapan.h5
Downloading /root/.keras-ocr/crnn_kurapan.h5
AssertionError [...]`

GhadaJouini commented 4 years ago

@isabella-karabasz can you share with me the project this is my email jouini.ghada1@gmail.com

null0nil commented 4 years ago
old_url="https://www.mediafire.com/file/mepzf3sq7u7nve9/craft_mlt_25k.h5/file"
new_url="https://download906.mediafire.com/koq5tsgbhing/mepzf3sq7u7nve9/craft_mlt_25k.h5"
source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/detection.py"
sed --in-place "s|${old_url}|${new_url}|g" $source_file

old_url="https://www.mediafire.com/file/pkj2p29b1f6fpil/crnn_kurapan.h5/file"
new_url="https://download2260.mediafire.com/726aptv1tqqg/pkj2p29b1f6fpil/crnn_kurapan.h5"
source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py"
sed --in-place "s|${old_url}|${new_url}|g" $source_file

@null0nil - I tried this option , however when i do a grep url in either of the python scripts I can see both the urls references and the sha256 error message still persists

It should work.. It it helps, I am using GNU sed version 4.4 (Ubuntu 18.04)

vivekslair commented 4 years ago

@isabella-karabasz since I'm working on google colab I placed them on this path: '/usr/local/lib/python3.6/dist-packages/keras_ocr/'

@GhadaJouini the /root/.keras-ocr can be accessed by using os.chdir() from Colab.... import os os.chdir('/root/.keras-ocr')

however I am yet to try downloading manually and checking if the code works. Will check and keep the forum posted

cezar-lima commented 4 years ago

@vivekslair @GhadaJouini As @isabella-karabasz stated on Colab you need to place your uploaded files on /root/.keras-ocr/ dir. After upload you can move them with !mv /craft_mlt_25k.h5 /root/.keras-ocr/craft_mlt_25k.h5 and !mv /crnn_kurapan.h5 /root/.keras-ocr/crnn_kurapan.h5, for example. This worked for me on Colab.

vivekslair commented 4 years ago

@vivekslair @GhadaJouini As @isabella-karabasz stated on Colab you need to place your uploaded files on /root/.keras-ocr/ dir. After upload you can move them with !mv /craft_mlt_25k.h5 /root/.keras-ocr/craft_mlt_25k.h5 and !mv /crnn_kurapan.h5 /root/.keras-ocr/crnn_kurapan.h5, for example. This worked for me on Colab.

@cezar-lima - Yes Cezar , I was infact able to download directly to keras-ocr folder by doing os.chdir('/roort/.keras-ocr') , however when i initialize pipeline the code seems to be checking for the weight files again and try to download from the website

isabella-karabasz commented 4 years ago

@isabella-karabasz can you share with me the project this is my email jouini.ghada1@gmail.com

I cannot share the project because the content is confidential. @cezar-lima 's latest reply should be sufficient to solve the problem. Good luck!

vivekslair commented 4 years ago

@vivekslair @GhadaJouini As @isabella-karabasz stated on Colab you need to place your uploaded files on /root/.keras-ocr/ dir. After upload you can move them with !mv /craft_mlt_25k.h5 /root/.keras-ocr/craft_mlt_25k.h5 and !mv /crnn_kurapan.h5 /root/.keras-ocr/crnn_kurapan.h5, for example. This worked for me on Colab.

@cezar-lima - Yes Cezar , I was infact able to download directly to keras-ocr folder by doing os.chdir('/roort/.keras-ocr') , however when i initialize pipeline the code seems to be checking for the weight files again and try to download from the website

@cezar-lima and @isabella-karabasz - i got where i was making a mistake , although I had placed the weight files in the /.keras-ocr folder there was an issue in the size of the file when i downloaded . Noticed that in keras_ocr.tools.py script both availability of file and size of the file is being checked . Downloaded them again and upload to the folder and voila :)

NeighborhoodCoding commented 4 years ago

Dose anyone experienced the ZeroDivisionError: division by zero? like

104/104 [==============================] - 461s 4s/step - loss: 22.7555 - val_loss: 20.7057
Epoch 2/1000
104/104 [==============================] - 451s 4s/step - loss: 20.3573 - val_loss: 21.4699
Epoch 3/1000
104/104 [==============================] - 373s 4s/step - loss: 16.7728 - val_loss: 16.2160
Epoch 4/1000
104/104 [==============================] - 350s 3s/step - loss: 14.6223 - val_loss: 16.1058
Epoch 5/1000
104/104 [==============================] - 369s 4s/step - loss: 13.7717 - val_loss: 13.6269
Epoch 6/1000
104/104 [==============================] - 360s 3s/step - loss: 13.0744 - val_loss: 11.9901
Epoch 7/1000
104/104 [==============================] - 372s 4s/step - loss: 11.6895 - val_loss: 11.6947
Epoch 8/1000
 14/104 [===>..........................] - ETA: 4:01 - loss: 12.2752
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-43-1c9e2d89d135> in <module>()
     20     validation_data=recognition_val_generator,
     21     validation_steps=math.ceil(len(background_splits[1])*1 / recognition_batch_size),
---> 22     workers=0
     23 )

10 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

UnknownError: 2 root error(s) found.
  (0) Unknown:  ZeroDivisionError: division by zero
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 244, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 827, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py", line 814, in wrapped_generator
    for data in generator_fn():

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py", line 359, in get_batch_generator
    batch = [sample for sample, _ in zip(image_generator, range(batch_size))]

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py", line 359, in <listcomp>
    batch = [sample for sample, _ in zip(image_generator, range(batch_size))]

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/data_generation.py", line 272, in convert_image_generator_to_recognizer_input
    skip_rotate=True)

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/tools.py", line 88, in warpBox
    scale = min(target_width / w, target_height / h)

ZeroDivisionError: division by zero

     [[{{node PyFunc}}]]
     [[IteratorGetNext]]
     [[Shape/_10]]
  (1) Unknown:  ZeroDivisionError: division by zero
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 244, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 827, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py", line 814, in wrapped_generator
    for data in generator_fn():

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py", line 359, in get_batch_generator
    batch = [sample for sample, _ in zip(image_generator, range(batch_size))]

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py", line 359, in <listcomp>
    batch = [sample for sample, _ in zip(image_generator, range(batch_size))]

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/data_generation.py", line 272, in convert_image_generator_to_recognizer_input
    skip_rotate=True)

  File "/usr/local/lib/python3.6/dist-packages/keras_ocr/tools.py", line 88, in warpBox
    scale = min(target_width / w, target_height / h)

ZeroDivisionError: division by zero

     [[{{node PyFunc}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_17688]

Function call stack:
train_function -> train_function

It is only my problem?

null0nil commented 4 years ago

crnn_kurapan_notop.h5 can be found at: (hint - you just go to the old url and then right-click on that big green "Download" button to copy the new url)

https://download1923.mediafire.com/l8rxrh8klvxg/n9yfn5wueu82rgf/crnn_kurapan_notop.h5


From: chiragbpatil notifications@github.com Sent: Wednesday, September 2, 2020 1:08 AM To: faustomorales/keras-ocr keras-ocr@noreply.github.com Cc: Tommy Chang tommy.chang@nextcentury.com; Mention mention@noreply.github.com Subject: Re: [faustomorales/keras-ocr] Getting error message : Error occurred verifying sha256 when initializing pipeline (#117)

old_url="https://www.mediafire.com/file/mepzf3sq7u7nve9/craft_mlt_25k.h5/file" new_url="https://download906.mediafire.com/koq5tsgbhing/mepzf3sq7u7nve9/craft_mlt_25k.h5" source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/detection.py" sed --in-place "s|${old_url}|${new_url}|g" $source_file

old_url="https://www.mediafire.com/file/pkj2p29b1f6fpil/crnn_kurapan.h5/file" new_url="https://download2260.mediafire.com/726aptv1tqqg/pkj2p29b1f6fpil/crnn_kurapan.h5" source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py" sed --in-place "s|${old_url}|${new_url}|g" $source_file

Thank you for these urls, can you please share new url for crnn_kurapan_notop.h5

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/faustomorales/keras-ocr/issues/117#issuecomment-685300679, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANLRJFUXXD42IAGVFV75M73SDXHLBANCNFSM4QPWORBA.

chiragbpatil commented 4 years ago

crnn_kurapan_notop.h5 can be found at: (hint - you just go to the old url and then right-click on that big green "Download" button to copy the new url) https://download1923.mediafire.com/l8rxrh8klvxg/n9yfn5wueu82rgf/crnn_kurapan_notop.h5 ____ From: chiragbpatil notifications@github.com Sent: Wednesday, September 2, 2020 1:08 AM To: faustomorales/keras-ocr keras-ocr@noreply.github.com Cc: Tommy Chang tommy.chang@nextcentury.com; Mention mention@noreply.github.com Subject: Re: [faustomorales/keras-ocr] Getting error message : Error occurred verifying sha256 when initializing pipeline (#117) old_url="https://www.mediafire.com/file/mepzf3sq7u7nve9/craft_mlt_25k.h5/file" new_url="https://download906.mediafire.com/koq5tsgbhing/mepzf3sq7u7nve9/craft_mlt_25k.h5" source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/detection.py" sed --in-place "s|${old_url}|${new_url}|g" $source_file old_url="https://www.mediafire.com/file/pkj2p29b1f6fpil/crnn_kurapan.h5/file" new_url="https://download2260.mediafire.com/726aptv1tqqg/pkj2p29b1f6fpil/crnn_kurapan.h5" source_file="/usr/local/lib/python3.6/dist-packages/keras_ocr/recognition.py" sed --in-place "s|${old_url}|${new_url}|g" $source_file Thank you for these urls, can you please share new url for crnn_kurapan_notop.h5 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#117 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANLRJFUXXD42IAGVFV75M73SDXHLBANCNFSM4QPWORBA.

Thank you :)

vivekslair commented 4 years ago

Closing the issue now as there is a work around and also since hte issue is more infra structure based and not code based

faustomorales commented 4 years ago

Ah, yes. It looks like I hit my bandwidth quota for the year. 🤦 Sorry folks. :| I originally hosted on GCP but that got expensive. I thought I found a way to keep my costs fixed by switching to MediaFire but didn't realize there was a bandwidth limit.

faustomorales commented 4 years ago

Recommendations for a free / cheap place to host where I can avoid paying for so much bandwidth are welcome!

MurthyAvanithsa commented 4 years ago

How about hosting it in GitHub itself, we could use the Github's releases or raw file modes to download. I also have seen other DL projects hosting their files on Google drive.

NeighborhoodCoding commented 4 years ago

maybe font.zip and backgroud.zip can be also uploaded in Github itself cuz it also need to be downloaded

faustomorales commented 4 years ago

Great suggestion, I've renamed and re-opened this issue. Not sure when I'll be able to get to it (maybe next weekend) but opening so that it's recorded somwhere.

faustomorales commented 3 years ago

This is fixed by https://github.com/faustomorales/keras-ocr/commit/c8a4137018f9fb32e7d542d7e9550b24c33405b3 and the fixed version is published to PyPi as v0.8.5. The exception is the MLT 2019 dataset which was too large. I've added a warning to the code to inform users of what to do to get that dataset. Thank you for your patience!

SplyzerRB commented 1 year ago

getting the issue;

Looking for C:\Users\rx.keras-ocr\craft_mlt_25k.h5 Looking for C:\Users\rx.keras-ocr\crnn_kurapan.h5