HazyResearch / pdftotree

:evergreen_tree: A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
MIT License
428 stars 90 forks source link

Broken keras imports #124

Open b-hemanth opened 1 year ago

b-hemanth commented 1 year ago

Attempting to use model_type=vision breaks due to outdated imports with error:

ImportError: cannot import name 'img_to_array' from 'keras.preprocessing.image'

in line 7 of pdftotree/pdftotree/visual/visual_utils.py, likely because these imports have moved to tf.keras.utils a la here and here. And as per the changelog, on 2020-10-13 ptt has upgraded Keras to 2.4.0 or later (and TensorFlow 2.2 or later). (#86, [@HiromuHota][HiromuHota]).

But, when I tried to make this edit locally and compile the package myself from the 0.5.1+dev channel, the kernel crashed, so leaving this as an issue here.

Environment:

HiromuHota commented 1 year ago

According to https://stackoverflow.com/a/72613445, TF/Keras moved that function in 2.9.0. Please manually install keras<2.9 (and the tensorflow compatible with this version of Keras), and try again.