digitorus / pdfsign

Add/verify Advanced Electronic Signature (AES) and Qualified Electronic Signature (QES) in PDF (usign pure Go)
BSD 2-Clause "Simplified" License
72 stars 16 forks source link

Error encoding UTF-8 characters #2

Closed mpldr closed 1 year ago

mpldr commented 1 year ago

2023-03-02-09-19-14

When adding a String with UTF-8 Symbols (for example in Location), it is shown as above.

vanbroup commented 1 year ago

I was able to confirm the issue with this test case:

func TestSignPDFFileUTF8(t *testing.T) {
    certificate_data_block, _ := pem.Decode([]byte(signCertPem))
    if certificate_data_block == nil {
        t.Errorf("failed to parse PEM block containing the certificate")
        return
    }

    cert, err := x509.ParseCertificate(certificate_data_block.Bytes)
    if err != nil {
        t.Errorf("%s", err.Error())
        return
    }

    key_data_block, _ := pem.Decode([]byte(signKeyPem))
    if key_data_block == nil {
        t.Errorf("failed to parse PEM block containing the private key")
        return
    }

    pkey, err := x509.ParsePKCS1PrivateKey(key_data_block.Bytes)
    if err != nil {
        t.Errorf("%s", err.Error())
        return
    }

    tmpfile, err := os.CreateTemp("", "pdfsign_test")
    if err != nil {
        t.Errorf("%s", err.Error())
        return
    }

    signerName := "姓名"
    signerLocation := "位置"
    err = SignFile("../testfiles/testfile20.pdf", tmpfile.Name(), SignData{
        Signature: SignDataSignature{
            Info: SignDataSignatureInfo{
                Name:        signerName,
                Location:    signerLocation,
                Reason:      "Test with UTF-8",
                ContactInfo: "None",
                Date:        time.Now().Local(),
            },
            CertType:   CertificationSignature,
            DocMDPPerm: AllowFillingExistingFormFieldsAndSignaturesPerms,
        },
        DigestAlgorithm: crypto.SHA512,
        Signer:          pkey,
        Certificate:     cert,
    })

    if err != nil {
        os.Remove(tmpfile.Name())
        t.Errorf("%s: %s", "testfile20.pdf", err.Error())
        return
    }

    info, err := verify.File(tmpfile)
    if err != nil {
        t.Errorf("%s: %s", tmpfile.Name(), err.Error())

        err2 := os.Rename(tmpfile.Name(), "../testfiles/failed/testfile20.pdf")
        if err2 != nil {
            t.Error(err2)
        }
    } else {
        if info.Signers[0].Name != signerName {
            t.Errorf("expected %q, got %q", signerName, info.Signers[0].Name)
        }
        if info.Signers[0].Location != signerLocation {
            t.Errorf("expected %q, got %q", signerLocation, info.Signers[0].Location)
        }
        os.Remove(tmpfile.Name())
    }
}

Result:

?       github.com/digitorus/pdfsign    [no test files]
?       github.com/digitorus/pdfsign/revocation [no test files]
?       github.com/digitorus/pdfsign/verify     [no test files]
--- FAIL: TestSignPDFFileUTF8 (0.00s)
    sign_test.go:354: expected "姓名", got "å§fiå’“"
    sign_test.go:357: expected "位置", got "体置"
FAIL
FAIL    github.com/digitorus/pdfsign/sign       0.698s
FAIL

I do need to dive deeper to find what might be causing this.

vanbroup commented 1 year ago

I found a way that should resolve this, but without success, although the test file from the document below doesn't seem to work in my versions of Acrobat eighter. https://www.pdfa.org/understanding-utf-8-in-pdf-2-0/

Can you try to download this test file and check if the document properties are shown correctly for you? https://github.com/pdf-association/pdf20examples/raw/master/pdf20-utf8-test.pdf

vanbroup commented 1 year ago

It might actually better to use UTF-16 encoding, I will do some testing with that.

vanbroup commented 1 year ago

Text is now encoded with UTF-16BE encoding when required, this resolved the issue on my side, please check if your issue has been resolved for you as well.