claird / PyPDF4

A utility to read and write PDFs with Python
328 stars 61 forks source link

Merge pdf/a to combined pdf/a #32

Open dschulten opened 5 years ago

dschulten commented 5 years ago

When PdfFileMerger merges pdf/a files, it loses pdf/a information and resets the PDF Version to 1.3.

Example pdf/a information:

<x:xmpmeta xmlns:x="adobe:ns:meta/">
 <rdf:RDF xmlns:rdf="">
  <rdf:Description rdf:about=""
  <rdf:Description rdf:about=""
   <pdf:Producer>LibreOffice 6.1</pdf:Producer>
  <rdf:Description rdf:about=""

pdf/a is a standard for long-term preservation in digital archives. The more general case would be support for xmpmeta information:

So people could at least add pdf/a metadata if they are convinced that their output conforms to pdf/a.

In pdf/a a number of things are disallowed or mandatory (depending on the pdf/a profile), and there are validation tools like verapdf to check if a pdf conforms to pdf/a. A long term goal might be to support creation of proper pdf/a, manipulating a pdf to make it conformant.