claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
328 stars 61 forks source link

Merge pdf/a to combined pdf/a #32

Open dschulten opened 5 years ago

dschulten commented 5 years ago

When PdfFileMerger merges pdf/a files, it loses pdf/a information and resets the PDF Version to 1.3.

Example pdf/a information:

<x:xmpmeta xmlns:x="adobe:ns:meta/">
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about=""
      xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
   <pdfaid:part>1</pdfaid:part>
   <pdfaid:conformance>A</pdfaid:conformance>
  </rdf:Description>
  <rdf:Description rdf:about=""
     xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
   <pdf:Producer>LibreOffice 6.1</pdf:Producer>
  </rdf:Description>
  <rdf:Description rdf:about=""
    xmlns:xmp="http://ns.adobe.com/xap/1.0/">
   <xmp:CreatorTool>Draw</xmp:CreatorTool>
   <xmp:CreateDate>2019-04-03T06:18:04Z</xmp:CreateDate>
  </rdf:Description>
 </rdf:RDF>
</x:xmpmeta>

pdf/a is a standard for long-term preservation in digital archives. The more general case would be support for xmpmeta information:

https://github.com/mstamy2/PyPDF2/issues/492

So people could at least add pdf/a metadata if they are convinced that their output conforms to pdf/a.

In pdf/a a number of things are disallowed or mandatory (depending on the pdf/a profile), and there are validation tools like verapdf to check if a pdf conforms to pdf/a. A long term goal might be to support creation of proper pdf/a, manipulating a pdf to make it conformant.