foliojs / pdfkit

A JavaScript PDF generation library for Node and the browser
http://pdfkit.org/
MIT License
9.89k stars 1.15k forks source link

non-Latin characters, non-BMP characters, symbols missing #1251

Open mined opened 3 years ago

mined commented 3 years ago

Bug Report

With a default font, the code below generates garbage for all non-Latin characters. With NotoSans (uncommenting that line), at least Greek and Cyrillic work, but characters beyond the BMP (from U+10000) or emojis (even in the BMP) do not display, also their replacement characters are not selectable.

Description of the problem

PDF appearance with default font: Hellö €œ ;³:”- &Ô&ñ&ò’ Øß8ØÜØ4ÝØ5Ý8 Ø<ßEØ=Þ

PDF appearance with Noto font: Hellö €œ αγΩЭ ⌷⌷⌷⌷ ⌷⌷⌷⌷ ⌷⌷

Code sample

const PDFDocument = require('pdfkit') const fs = require('fs')

let foo = new PDFDocument foo.pipe(fs.createWriteStream('pdfkit.pdf')) foo.registerFont('NotoSans', './NotoSans-Regular.ttf') let f = foo //f = f.font('NotoSans')

f.text('Hellö') f.text('€œ') f.text('αγΩЭ') f.text('⛔⛱⛲✅') f.text('𐌸𐐀𝄞𝔸') f.text('🍅😀') foo.end()

Your environment

liborm85 commented 3 years ago

Default fonts is not unicode fonts, therefore it does not work. For support is required font file with these necessary characters.

mintty commented 3 years ago

Please uncomment the following line for a font that contains the characters: f = f.font('NotoSans') In any case, the number of glyphs rendered, which is larger than the number of characters, clearly indicates that character encoding is not properly recognized in the failing cases.

mintty commented 3 years ago

Sorry, my test case was flawed. With Noto Sans, the glyphs rendered match the characters, just the actual glyphs are missing (replacement boxes shown). Still a problem that characters that are not found in the font are rendered as multiple glyphs.

liborm85 commented 3 years ago

I only see that font NotoSans doesn't support these characters. If there is any other problem it is related to library https://github.com/foliojs/fontkit.

mintty commented 3 years ago

See my fontkit report referring here above. Maybe at least the font selection problem is actually a pdfkit issue?

mintty commented 3 years ago

The fontkit issue was closed, so we're back here... Problem: Glyphs not available in the font are neither displayed (▯) nor can they be copied and pasted back transparently (which is however an important feature in certain applications). The generated PDF contains the following in the affected cases:

[<000d00160017001100050000> 0] TJ The last character output is 0000

1 beginbfrange
<0000> <0006> [<0000> <26d4> <26f2> <2705> <d83c df1b> <d83c df45> <d83d de00>]
endbfrange

The 0000 is mapped to <0000> for copy/paste.

blikblum commented 3 years ago

Is it possible in PDF?

Do you have a pdf created in other library or program that exhibits this behavior?

mintty commented 3 years ago

Yes: fpdf2.pdf

devongovett commented 3 years ago

If the font doesn't contain the glyphs there's nothing we can do. You need to provide a font that has the glyphs for the characters you want to use.

blikblum commented 3 years ago

Yes: fpdf2.pdf

Thanks. Can you provide source code that created this file? Did you used PHP fpdf library ?

mintty commented 3 years ago

The PDF was generated with the fpdf2 python library, latest repository version after fixes about the same issue.

blikblum commented 3 years ago

The PDF was generated with the fpdf2 python library, latest repository version after fixes about the same issue.

Please provide the source of example so we can replicate the issue with js/pdfkit (compare the difference in generated pdf)

mintty commented 3 years ago
import fpdf

doc = fpdf.FPDF()
doc.set_compression(False)
# text fonts
doc.add_font('normal', '', 'NotoSans-Regular.ttf', uni = True)
doc.add_font('dejavu', '', 'DejaVuSans.ttf', uni = True)
# ttc does not work:
#doc.add_font('cjk', '', 'NotoSansCJK-Regular.ttc', uni = True)
#doc.add_font('cjk', '', 'NotoSansCJKjp-Regular.otf', uni = True)

# emoji fonts
doc.add_font('emojis', '', 'NotoEmoji-Regular.ttf', uni = True)
# these do not work:
#doc.add_font('emojis', 'NotoColorEmoji.ttf', uni = True)
#doc.add_font('emojis', 'NotoColorEmoji_WindowsCompatible.ttf', uni = True)

doc.add_font('unifont', '', 'unifont.ttf', uni = True)

normal = 'normal'
normal = 'dejavu'
emojis = 'emojis'
uni = 'unifont'
cjk = uni

doc.add_page()
doc.set_font(normal)
doc.text(50, 20, 'Latin: Hællœ 1€ ŵ')
doc.text(90, 20, 'ℕº¼ᵤ⁰ᵆ ⓐ 🄴 ㉐㋏㌳ ︕﹠Wキ ゟヿ(よりコト)')
doc.text(50, 30, 'Greek, Cyrillic: αγΩ Это тест')
doc.set_font(cjk)
doc.text(50, 40, 'CJK: 啕咱')
doc.text(90, 40, 'Plane 2 CJK: 𠄢偺')
doc.set_font(normal)
doc.text(50, 50, 'Arabic ligature: U+FDFA <ﷺ>')
doc.text(50, 60, 'Arabic ligature: U+FDFB <ﷻ>')

doc.text(50, 70, 'Plane 0 emojis:')
doc.set_font(emojis)
doc.text(100, 70, '⛔⛱⛲✅')
doc.set_font(uni)
doc.text(130, 70, '⛔⛱⛲✅')

doc.set_font(normal)
doc.text(50, 80, 'Plane 1 characters:')
doc.text(100, 80, '𐌸𐐀𑁍𝄞𝔸')
doc.set_font(uni)
doc.text(130, 80, '𐌸𐐀𑁍𝄞𝔸')

doc.set_font(normal)
doc.text(50, 90, 'Plane 1 emojis:')
doc.set_font(emojis)
doc.text(100, 90, '🌛🍅😀')
doc.set_font(uni)
doc.text(130, 90, '🌛🍅😀')

doc.output('fpdf2.pdf')