Closed hvbtup closed 1 month ago
tab_character_pdfa.zip This rptdesign file can be used for demonstrating the issue: When a PDF report is created with BIRT, it references the Times-Roman font.
I'll create a PR which fixes this specific issue with the TAB character by replacing it with a space for PDF output...
Thanks, Henning!
Fixed with #1949.
When a data item contains text like (in Java/C/JS syntax)
"abc\tdef"
containing a horizontal tab character (U+0009), this breaks PDF/A conformance.I examined this in more detail.
In the resulting PDF, the tab character is contained as text in the content stream, using the "Times-Roman" font. This font is one of the "builtin" PDF fonts, not a TrueType font. Thus it is not embedded (or subsetted) into the PDF. This means, the glyph metrics and shapes are not contained in the PDF.
But PDF/A requires that all fonts must be embedded/subsetted.
What happens internally is:
There is a class
FontSplitter
in BIRT which is called fromChunkGenerator.getNext()
, which in turn is called fromTextCompositor
.It's task seems to be to ensure that every character of the text can be displayed; so it uses rules to select a font for each character (most font files only support a small subset of the Unicode characters).
In case of the TAB character, the method
FontHandler.selectFont(c)
returns "Times-Roman". This happens when the preferred font (in my case, "Arial") does not support the character (methodcharExists(c)
return null). By the way,charExists
for TrueType fonts looks for the glyph metrics, whereascharExists
for Type1 fonts works very different.The BIRT logic thinks that "Times-Roman" supports the TAB character, and thus, this font is selected.
Other control characters in the Unicode Code Point range 0-31 will proably cause issues, too.
In an ideal world, text in data items would not contain any TAB characters.
I am not sure how the different control characters should be handled by the different emitters.
Furthermore, I am not aware of an obvious workaround/solution without changing BIRT Source code.
I looked at some TrueType fonts (including some fonts for Code 39 and Code 128 barcodes, consola.ttf, arial.ttf, arialuni.ttf. None of them supports the TAB character.
The reason is obvious: A TAB character cannot have a glyph width, because its width is dynamic by definition.