I've seen a number of PDF files where the title attribute/property is reported as None but when then accessing /Title there is content. I've no idea if this is a problem with the pdf(s) or with PyPDF. There is a workaround (which may be an indication of a potential change to PyPDF but I'm unclear of what the correct thing to do here is)
Attached PDF title_bug.pdf is about 5Mb and is a sample of a document that exhibits this behavior, I did not create it (nor do I know how it was created) so the only information we have is the meta data inside.
Test case, along with workaround below:
#!/usr/bin/env python
# -*- coding: windows-1252 -*-
# vim:ts=4:sw=4:softtabstop=4:smarttab:expandtab
#
import os
import sys
ver_to_test = 2
ver_to_test = 3
ver_to_test = 4
if ver_to_test == 4:
from pypdf import PdfFileReader # https://github.com/claird/PyPDF4
elif ver_to_test == 3:
from PyPDF3 import PdfFileReader # https://github.com/mstamy2/PyPDF3
else:
from PyPDF2 import PdfFileReader # https://github.com/mstamy2/PyPDF2 / https://pythonhosted.org/PyPDF2/
print('Python %s on %s' % (sys.version, sys.platform))
filename = 'title_bug.pdf'
f = open(filename, 'rb')
pdf = PdfFileReader(f)
info = pdf.documentInfo
#print(info)
print('title attribute %r' % info.title) # reports None
print('title getText() %r' % info.getText("/Title")) # this is what .title property calls
print('title get() %r' % info.get("/Title")) # this is part of what dict[] does
print('title get().getObject() %r' % info.get("/Title").getObject()) # this is what dict[] does
print('/Title dict entry %r' % info['/Title']) # with test pdf works
print('title attribute %r' % info.title) # Sanity check it is still None
print('title Workaround %r' % (info.title or info['/Title'])) # Workaround
f.close()
Just found this fork/project after logging https://github.com/mstamy2/PyPDF3/issues/13 test case below is for PyPDF4.
I've seen a number of PDF files where the
title
attribute/property is reported as None but when then accessing/Title
there is content. I've no idea if this is a problem with the pdf(s) or with PyPDF. There is a workaround (which may be an indication of a potential change to PyPDF but I'm unclear of what the correct thing to do here is)Attached PDF title_bug.pdf is about 5Mb and is a sample of a document that exhibits this behavior, I did not create it (nor do I know how it was created) so the only information we have is the meta data inside.
Test case, along with workaround below: