MatthiasValvekens / pyHanko

pyHanko: sign and stamp PDF files
MIT License
460 stars 68 forks source link

Support of non-English aplphabet (e.g. UTF-8) in stamp-text #374

Closed KingCZE closed 5 months ago

KingCZE commented 5 months ago

Hi, using pyHanko version 0.21.0.

The stamp text works just fine with stamp-text: "Signed by %(signer)s\nTimestamp: %(ts)s\n%(url)s" in pyhanko.yml and it even copies the Czech symbols from the (signer) correctly (e.g. Král). \ \ But when I want to have a customised text with e.g. stamp-text: "Signed by Král\nTimestamp: %(ts)s\n%(url)s" (or even "Podepsáno X.Králem") it creates "Signed by Král". \ \ When I try to modify it to some html form (not sure what coding it uses) and I do stamp-text: "Signed by Kr%C3%A1l\nTimestamp: %(ts)s\n%(url)s" I get an error message

ValueError: unsupported format character 'K' (0x4b) at index 19
Error: Generic processing error.

or

  File "C:\ProgramData\miniconda3\envs\myenv\Lib\site-packages\yaml\parser.py", line 438, in parse_block_mapping_key
    raise ParserError("while parsing a block mapping", self.marks[-1],
yaml.parser.ParserError: while parsing a block mapping
  in "<unicode string>", line 54, column 9:
            type: qr
            ^
expected <block end>, but found '<scalar>'
  in "<unicode string>", line 63, column 33:
            stamp-text: "Signed by Kr%C3%A1l\n ...
                                   ^

\ \ How to insert somethin like e.g. stamp-text: "Podepsáno Matějem Vomáčkou\nEmail: mail@domain.com / Tel.: +420 xxx xxx xxx\nTimestamp: %(ts)s"

MatthiasValvekens commented 5 months ago

Hmm, for some reason the "Convert to discussion" button doesn't seem to work today...

Anyway, this behaviour is explained in the FAQ: https://pyhanko.readthedocs.io/en/latest/faq.html#i-want-to-put-unicode-text-in-my-signatures-but-i-m-only-seeing-blanks-what-gives.

The reason is: the set of fonts that can (in practice) be reasonably assumed to be present in every PDF reader out there is very small. The exact glyph set of these fonts varies a bit from implementation to implementation, but again one typically can't just assume they have whatever one needs. For these "standard" fonts, there's also the issue of legacy encodings (often) being needed since they're often implemented as PS1 fonts without support for the things that enable using modern character collection standards.

Long story short: if you want to render (essentially) anything beyond basic ASCII, you have to choose & embed your own font. This is good practice anyway, since it ensures rendering consistency. There are links in the FAQ (but I'll extend the explanation a bit when I have time because it's not very explicit about the underlying reasons).

The reason why this is not the default behaviour is because it requires the user to choose a font and supply the accompanying font file, so it can't work without additional configuration.