DCsunset / pandoc-include

A pandoc filter to allow file and header inclusion
MIT License
67 stars 14 forks source link

How to debug json error? #13

Closed PeterSommerlad closed 4 years ago

PeterSommerlad commented 4 years ago

Hi, I am using pandoc 2.11.1.1 and pandoc-include

pip show pandoc-include
Name: pandoc-include
Version: 0.8.4

I split a large markdown file that was generated from a docx and unfortunately pandoc-include fails with a json error. I checked the individual files, they all conform with utf8. So what could be the problem here?

error message attached. Could it be the case that the pandoc and corresponding panflute updates created some incompatibilities with pandoc-include? I had to debug other filters using panflute for that, but I am completely have no idea on how to treat the underlying json error.

Thanks for help

Regards Peter.

  File "/usr/local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/pandoc_include.py", line 166, in main
    return pf.run_filter(action, doc=doc)
  File "/usr/local/lib/python3.9/site-packages/panflute/io.py", line 224, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/panflute/io.py", line 205, in run_filters
    doc = doc.walk(action, doc)
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 264, in walk
    ans = list(chain.from_iterable(ans))
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 262, in <genexpr>
    ans = ((item,) if type(item) != list else item for item in ans)
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 259, in <genexpr>
    ans = (item.walk(action, doc) for item in obj)
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 275, in walk
    altered = action(self, doc)
  File "/usr/local/lib/python3.9/site-packages/pandoc_include.py", line 137, in action
    new_elems = pf.convert_text(
  File "/usr/local/lib/python3.9/site-packages/panflute/tools.py", line 393, in convert_text
    out = json.loads(out, object_hook=from_json)
  File "/usr/local/Cellar/python@3.9/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
  File "/usr/local/Cellar/python@3.9/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python@3.9/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20078 (char 20077)
Error running filter pandoc-include:
Filter returned error status 1
DCsunset commented 4 years ago

It may be caused by some incompatibilities between pandoc and panflute. I'm not sure whether it is introduced by the newest update. Does pandoc-include v0.8.3 work for this file?

If only v0.8.4 causes this problem, you may try using the development version to debug by cloning this repo and log the intermediate JSON value into some file.

Feel free to update this thread if you still find it difficult to debug it.

PeterSommerlad commented 4 years ago

Bisecting the offending documents I found spurious unicode characters <U+2028> (line separator) <U+2029> (paragraph separator) in the markdown generated from word. After deleting those, I no longer get these errors. Those might have been a problem in the original word file alreadz, which I cannot check right now (lacking MS Word), but it might be worth investigating, why the json reader is unhappy with them.

I conclude it is a bug in the json reader of python 3.9. However, I am not so much into Python development and use to feel comfortable creating a bug report there.