empira / PDFsharp-1.5

A .NET library for processing PDF
MIT License
1.28k stars 588 forks source link

What is needed for creating PDF/A-3b #125

Open flensrocker opened 4 years ago

flensrocker commented 4 years ago

In reference to https://forum.pdfsharp.net/viewtopic.php?f=4&t=3031

I am evaluating the needed changes to PdfSharp, to create documents compliant to PDF/A-3b. For now I stumble through various specifications with mostly trial&error.

My usecase:

What I don't want:

Here is, what I have discovered by now, actually the code changes are not in a good shape, because frist I want to create a valid document somehow, and not in a robust way...

Think of it as some random notes, so they want get lost.

PDF Version

Must be 1.7 (as far as I know)

XMP Metadata changes

Replace old metadata in PdfMetadata.cs with this:

            "<x:xmpmeta xmlns:x=\"adobe:ns:meta/\">\n" +
            "    <rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" +
            "        <rdf:Description rdf:about=\"\"\n" +
            "            xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\"\n" +
            "            xmlns:pdf=\"http://ns.adobe.com/pdf/1.3/\"\n" +
            "            xmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n" +
            "            xmlns:xmp=\"http://ns.adobe.com/xap/1.0/\"\n" +
            "            xmlns:xmpMM=\"http://ns.adobe.com/xap/1.0/mm/\">\n" +
            "            <pdfaid:part>3</pdfaid:part>\n" +
            "            <pdfaid:conformance>B</pdfaid:conformance>\n" +
            "            <xmpMM:InstanceID>uuid:" + instanceId + "</xmpMM:InstanceID>\n" +
            "            <xmpMM:DocumentID>uuid:" + documentId + "</xmpMM:DocumentID>\n" +
            "            <xmp:CreateDate>" + creationDate + "</xmp:CreateDate>\n" +
            "            <xmp:ModifyDate>" + modificationDate + "</xmp:ModifyDate>\n" +
            "            <xmp:MetadataDate>" + modificationDate + "</xmp:MetadataDate>\n" +
            "            <xmp:CreatorTool>" + creator + "</xmp:CreatorTool>\n" +
            "            <pdf:Producer>" + producer + "</pdf:Producer>\n" +
            "            <dc:creator>\n" +
            "                <rdf:Seq>\n" +
            "                    <rdf:li></rdf:li>\n" +
            "                </rdf:Seq>\n" +
            "            </dc:creator>\n" +
            "            <dc:title>\n" +
            "                <rdf:Alt>\n" +
            "                    <rdf:li xml:lang=\"x-default\">" + title + "</rdf:li>\n" +
            "                </rdf:Alt>\n" +
            "            </dc:title>\n" +
            "            <dc:description>\n" +
            "                <rdf:Alt>\n" +
            "                    <rdf:li xml:lang=\"x-default\"></rdf:li>\n" +
            "                </rdf:Alt>\n" +
            "            </dc:description>\n" +
            "        </rdf:Description>\n" +
            "    </rdf:RDF>\n" +
            "</x:xmpmeta>\n" +

No interactive elements allowed

So I think, everything around AcroForms must not be used. Since I only create "printable" documents without any of this functionality, I just ignore them. For a "real" solution, PdfSharp should throw an error, if you try to use something not valid to PDF/A.

Catalog improvements

The catalog must include an /OutputIntents array with at least one /Type /OutputIntent /S /GTS_PDFA1 /OutputConditionIdentifier (sRGB2014) /DestOutputProfile ... where the DestOutputProfile is an embeded ICC color profile. I use the "sRGB2014.icc" from http://www.color.org/srgbprofiles.xalter

The ICC must be a stream object with /N 3 /Alternate /DeviceRGB /Filter /FlateDecode /Length ....

Link Annotation

Weblinks must have a key /F 4, I don't really know, why. It seems some "printable" link annotation.

With these changes the various validators, e.g. veraPDF, declare my generated document as "compliant".

For ZUGFeRD there are some changes needed on attachments like /Relationship and an /AF array (associated files) and other things. That's next on my list...

Here some code dumps, if anyone is interested. It isn't clean, it's not beautiful, but it works...

Additions to PdfCatalog.cs

        private static readonly SemaphoreSlim _iccLock = new SemaphoreSlim(1, 1);
        private static byte[] _compressedIccBytes;
        private static byte[] CompressedIccBytes
        {
            get
            {
                if (_compressedIccBytes == null)
                {
                    _iccLock.Wait();
                    try
                    {
                        var iccStream = System.Reflection.Assembly.GetExecutingAssembly().GetManifestResourceStream("PdfSharp.sRGB2014.icc");
                        var iccBytes = new byte[iccStream.Length];
                        iccStream.Read(iccBytes, 0, iccBytes.Length);
                        _compressedIccBytes = new Filters.FlateDecode().Encode(iccBytes);
                    }
                    finally
                    {
                        _iccLock.Release();
                    }
                }
                return _compressedIccBytes;
            }
        }

        private PdfDictionary IccProfile
        {
            get
            {
                if (_iccProfile == null)
                {
                    _iccProfile = new PdfDictionary(Owner);
                    _iccProfile.Elements.SetInteger("/N", 3);
                    _iccProfile.Elements.SetName("/Alternate", "/DeviceRGB");
                    _iccProfile.Elements.SetName("/Filter", "/FlateDecode");
                    var stream = _iccProfile.CreateStream(CompressedIccBytes);
                    _iccProfile.Elements.SetInteger(PdfStream.Keys.Length, stream.Length);
                    Owner.Internals.AddObject(_iccProfile);
                }
                return _iccProfile;
            }
        }
        PdfDictionary _iccProfile;

        private PdfDictionary OutputIntent
        {
            get
            {
                if (_outputIntent == null)
                {
                    _outputIntent = new PdfDictionary(Owner);
                    _outputIntent.Elements.SetName(Keys.Type, "/OutputIntent");
                    _outputIntent.Elements.SetName("/S", "/GTS_PDFA1");
                    _outputIntent.Elements.SetString("/OutputConditionIdentifier", "sRGB2014");
                    _outputIntent.Elements.SetReference("/DestOutputProfile", IccProfile.Reference);
                    Owner.Internals.AddObject(_outputIntent);
                }
                return _outputIntent;
            }
        }
        PdfDictionary _outputIntent;

        public PdfArray OutputIntents
        {
            get
            {
                if (_outputIntents == null)
                {
                    _outputIntents = new PdfArray(Owner);
                    _outputIntents.Elements.Add(OutputIntent);
                    Owner.Internals.AddObject(_outputIntents);
                    Elements.SetReference(Keys.OutputIntents, _outputIntents.Reference);
                }
                return _outputIntents;
            }
        }
        PdfArray _outputIntents;
...
        internal override void PrepareForSave()
        {
...
            IccProfile.PrepareForSave();
            OutputIntent.PrepareForSave();
            OutputIntents.PrepareForSave();

PdfLinkAnnotation.cs

case LinkType.Web:
    //pdf.AppendFormat("/A<</S/URI/URI{0}>>\n", PdfEncoders.EncodeAsLiteral(url));
    Elements.SetInteger(PdfAnnotation.Keys.F, 4);

If I will come to a point where I can submit a pull request, I will do...

abelykh0 commented 1 year ago

In addition, I am getting "Spec. ISO_19005_3 clause 6.2.11.3 test 2" ISO 32000-1:2008, 9.7.4, Table 117 requires that all embedded Type 2 CIDFonts in the CIDFont dictionary shall contain a CIDToGIDMap entry that shall be a stream mapping from CIDs to glyph indices or the name Identity, as described in ISO 32000-1:2008, 9.7.4, Table 117. Would be a very easy fix, need to add descendantFontDictionary.Elements.SetName("/CIDToGIDMap", "/Identity"), which is a default anyway.