0xabu / pdfannots

Extracts and formats text annotations from a PDF file
MIT License
522 stars 97 forks source link

Breaks with Highlighted text #12

Closed Nickroll closed 5 years ago

Nickroll commented 5 years ago

This might be due to a change in pdfminer.six's PDFObjRef.

Traceback (most recent call last):
  File "pdfannots.py", line 351, in <module>
    main()
  File "pdfannots.py", line 348, in main
    printannots(fh)
  File "pdfannots.py", line 316, in printannots
    pageannots = getannots(pdfannots, pageno)
  File "pdfannots.py", line 149, in getannots
    a = Annotation(pageno, subtype.name.lower(), pa.get('QuadPoints'), pa.get('Rect'), contents)
  File "pdfannots.py", line 102, in __init__
    assert len(coords) % 8 == 0
TypeError: object of type 'PDFObjRef' has no len()

Environment. ca-certificates 2018.03.07 0
certifi 2018.11.29 py36_0
chardet 3.0.4 libcxx 4.0.1 hcfea43d_1
libcxxabi 4.0.1 hcfea43d_1
libedit 3.1.20170329 hb402a30_2
libffi 3.2.1 h475c297_4
ncurses 6.1 h0a44026_1
openssl 1.1.1a h1de35cc_0
pdfminer.six 20181108 pip 18.1 py36_0
pycryptodome 3.7.2 python 3.6.8 haf84260_0
readline 7.0 h1de35cc_5
setuptools 40.6.3 py36_0
six 1.12.0 sortedcontainers 2.1.0 sqlite 3.26.0 ha441bb4_0
tk 8.6.8 ha441bb4_0
wheel 0.32.3 py36_0
xz 5.2.4 h1de35cc_4
zlib 1.2.11 h1de35cc_3

If there is no highlighted text Document doesn't include outlines ("bookmarks") is returned. pdf2text.py works fine on the text regardless of highlight status.

Nickroll commented 5 years ago

Fixed in issue #9

0xabu commented 5 years ago

Sorry you hit this, I'll try to get around to making the fix in the next week.