CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
186 stars 55 forks source link

Issue with simple field #238

Open botzill opened 7 years ago

botzill commented 7 years ago

We have a scenario which fails with the following error:

Traceback (most recent call last):
  File "/Users/geo/PycharmProjects/pydocx/tt/t.py", line 74, in <module>
    html = PyDocX.to_html(open(file_path, 'rb'))
  File "/Users/geo/PycharmProjects/pydocx/pydocx/pydocx.py", line 13, in to_html
    return PyDocXHTMLExporter(path_or_stream).export()
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 211, in export
    for result in super(PyDocXHTMLExporter, self).export()
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 209, in <genexpr>
    result.to_html() if isinstance(result, HtmlTag)
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 123, in export
    for result in self.export_node(document):
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 127, in apply
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 127, in apply
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 252, in yield_nested
    for result in func(item):
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 127, in apply
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 252, in yield_nested
    for result in func(item):
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 127, in apply
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 267, in yield_nested_with_line_breaks_between_paragraphs
    for result in func(item):
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 280, in export_paragraph
    results = is_not_empty_and_not_only_whitespace(results)
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/html.py", line 78, in is_not_empty_and_not_only_whitespace
    for item in gen:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 252, in yield_nested
    for result in func(item):
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 218, in export_node
    for result in results:
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 252, in yield_nested
    for result in func(item):
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 216, in export_node
    results = caller(node)
  File "/Users/geo/PycharmProjects/pydocx/pydocx/export/base.py", line 524, in export_simple_field
    parsed_instr = simple_field.parse_instr()
  File "/Users/geo/PycharmProjects/pydocx/pydocx/openxml/wordprocessing/simple_field.py", line 43, in parse_instr
    m = self._parse_instr_into_field_type_and_arg_string()
  File "/Users/geo/PycharmProjects/pydocx/pydocx/openxml/wordprocessing/simple_field.py", line 35, in _parse_instr_into_field_type_and_arg_string
    return re.match('^\s*([^\s]+)\s*(.*)$', self.instr)
  File "/Users/geo/.virtualenvs/pydocx/lib/python2.7/re.py", line 141, in match
    return _compile(pattern, flags).match(string)
TypeError: expected string or buffer

After some investigations I see that we can have None values for this: https://github.com/CenterForOpenScience/pydocx/blob/master/pydocx/openxml/wordprocessing/simple_field.py#L23.

So, I guess a simple check here: https://github.com/CenterForOpenScience/pydocx/blob/master/pydocx/openxml/wordprocessing/simple_field.py#L41 should solve the issue?

I did not get into mush details here related to complex -> simple field conversion.