claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
328 stars 61 forks source link

cannot fetch pdf metadata when do not exist #57

Open kirk86 opened 5 years ago

kirk86 commented 5 years ago

It would be better to fetch document info like title, author, etc. by reading the first page and getting this info since for some pdfs which don't have those fields in the metadata PyPDF4 returns with empty values. Again maybe I'm misunderstanding the purpose of PyPDF but my impression was that this type of info was coming from reading the actual pdf and extracting the relevant info otherwise one could just use pdfinfo from pdftools to accomplish the same task, but as I mentioned the problem is when information like title and author are not in the metadata then one needs to read and extract that info from the pdf itself.