CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
183 stars 55 forks source link

AttributeError: module 'collections' has no attribute 'Hashable' #261

Open realKarthikNair opened 2 years ago

realKarthikNair commented 2 years ago
karthik@cosmic:/tmp$ pydocx --html hello1.docx hello1.html
Traceback (most recent call last):
  File "/usr/local/bin/pydocx", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/pydocx/__main__.py", line 49, in cli
    sys.exit(main(args=sys.argv[1:]) or 0)
  File "/usr/local/lib/python3.10/dist-packages/pydocx/__main__.py", line 44, in main
    return convert(output_type, docx_path, output_path)
  File "/usr/local/lib/python3.10/dist-packages/pydocx/__main__.py", line 15, in convert
    output = PyDocX.to_html(docx_path)
  File "/usr/local/lib/python3.10/dist-packages/pydocx/pydocx.py", line 13, in to_html
    return PyDocXHTMLExporter(path_or_stream).export()
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/html.py", line 208, in export
    return ''.join(
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/html.py", line 208, in <genexpr>
    return ''.join(
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 117, in export
    self._first_pass_export()
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 129, in _first_pass_export
    for result in self.export_node(document):
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/html.py", line 127, in apply
    for result in results:
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/html.py", line 127, in apply
    for result in results:
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 252, in yield_nested
    for result in func(item):
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/html.py", line 278, in export_paragraph
    results = is_not_empty_and_not_only_whitespace(results)
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/html.py", line 78, in is_not_empty_and_not_only_whitespace
    for item in gen:
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 252, in yield_nested
    for result in func(item):
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 216, in export_node
    results = caller(node)
  File "/usr/local/lib/python3.10/dist-packages/pydocx/export/base.py", line 344, in export_run
    if run.effective_properties:
  File "/usr/local/lib/python3.10/dist-packages/pydocx/util/memoize.py", line 24, in __call__
    if not isinstance(args, collections.Hashable):
AttributeError: module 'collections' has no attribute 'Hashable'

Python 3.10.4 (main, Apr 2 2022, 09:04:19) [GCC 11.2.0] on linux

probablyArth commented 1 year ago

Did you resolve it?

zzw1991 commented 1 year ago

image

realKarthikNair commented 1 year ago

@probablyArth Nope.

Boston-of-Gilead commented 1 year ago

Bumping, having identical issue. Is this module still being supported? OP opened the issue in June.

realKarthikNair commented 1 year ago

I don't think it is still being supported

Boston-of-Gilead commented 1 year ago

So my workaround is to go into memoize.py and just comment out lines 24 and 27 since the function is basically failing to check for that hashtable (collections doesn't have a hashtable attribute). I won't claim to know what this does because I don't have to time to unravel the entire module, so make sure you test thoroughly, but I'm not seeing any issues so far.

image

I'm then able to run PyDocX successfully using the Exporter class:

from pydocx import PyDocX
from pydocx.export import PyDocXHTMLExporter

# Pass in a file object
exporter = PyDocXHTMLExporter(open('C:\\Users\\<user>\\Documents\\AOT.docx', 'rb'))
html = exporter.export()

print(html)
realKarthikNair commented 1 year ago

Thanks man.

gradyy commented 1 year ago

This looks like a python2 to python3 issue. Where once collections.Hashable existed now typing.Hashable has taken its place. Unfortunately until a new version is packaged and published the manual edit is probably necessary. https://stackoverflow.com/questions/3460650/asking-is-hashable-about-a-python-value https://python.readthedocs.io/en/v2.7.2/library/collections.html?highlight=collections#collections.Hashable https://docs.python.org/3/library/collections.abc.html#collections.abc.Hashable

2020Mike commented 10 months ago

I'm having the same problem as OP.

$ pydocx --html file-sample_1MB.docx output.html Traceback (most recent call last): File "/home/phillip/.local/bin/pydocx", line 8, in sys.exit(cli()) File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/main.py", line 49, in cli sys.exit(main(args=sys.argv[1:]) or 0) File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/main.py", line 44, in main return convert(output_type, docx_path, output_path) File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/main.py", line 15, in convert output = PyDocX.to_html(docx_path) File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/pydocx.py", line 13, in to_html return PyDocXHTMLExporter(path_or_stream).export() File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 208, in export return ''.join( File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 208, in return ''.join( File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 117, in export self._first_pass_export() File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 129, in _first_pass_export for result in self.export_node(document): File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 218, in export_node for result in results: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 127, in apply for result in results: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 218, in export_node for result in results: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 127, in apply for result in results: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 252, in yield_nested for result in func(item): File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 218, in export_node for result in results: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 278, in export_paragraph results = is_not_empty_and_not_only_whitespace(results) File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 78, in is_not_empty_and_not_only_whitespace for item in gen: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 114, in apply results = is_not_empty_and_not_only_whitespace(results) File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/html.py", line 78, in is_not_empty_and_not_only_whitespace for item in gen: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 252, in yield_nested for result in func(item): File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 216, in export_node results = caller(node) File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/export/base.py", line 344, in export_run if run.effective_properties: File "/home/phillip/.local/lib/python3.10/site-packages/pydocx/util/memoize.py", line 24, in call if not isinstance(args, collections.Hashable): AttributeError: module 'collections' has no attribute 'Hashable'

SWHL commented 5 months ago

I met the same problem. The environment:

python 3.10.13
pydocx 0.9.10

I fixed it by following: https://github.com/CenterForOpenScience/pydocx/blob/98c6aa626d875278240eabea8f86a914840499b3/pydocx/util/memoize.py#L24

to:

if not isinstance(args, collections.abc.Hashable):
    # uncacheable. a list, for instance.
    # better to not cache than blow up.
    return self.func(*args)