foliojs / fontkit

An advanced font engine for Node and the browser
1.45k stars 213 forks source link

full unicode display and copy/paste support #257

Closed mintty closed 3 years ago

mintty commented 3 years ago

As an attempt to fix foliojs/pdfkit#1251, I came up with the test program below. It produces PDF output which looks like the second section below. Selecting all text in the PDF and copy/paste into a text file yields the result in the third section below. Problems are:


const PDFDocument = require('pdfkit') const fs = require('fs')

let doc = new PDFDocument doc.pipe(fs.createWriteStream('pdfkit.pdf')) doc.registerFont('normal', './NotoSans-Regular.ttf') doc.registerFont('emojis', './NotoEmoji-Regular.ttf') // this one does not work: doc.registerFont('NotoColorEmoji', './NotoColorEmoji_WindowsCompatible.ttf')

doc.font('normal') doc.text('Hællœ 1€') doc.text('Greek, Cyrillic: αγΩЭ') doc.text('CJK: 啕') doc.text('4 BMP emojis:')

doc.font('emojis') doc.text('⛔⛱⛲✅')

doc.font('normal') doc.text('5 non-BMP characters:') doc.text('𐌸𐐀𑁍𝄞𝔸') doc.text('3 non-BMP emojis:')

doc.font('emojis') doc.text('🌛🍅😀')

doc.end()


Hællœ 1€ Greek, Cyrillic: αγΩЭ CJK: ▯ 4 BMP emojis: ⛔▯⛲✅ 5 non-BMP characters: ▯▯▯▯▯ 3 non-BMP emojis: 🌛🍅😀


Hælloe 1€ Greek, Cyrillic: αγΩЭ CJK: 􀀀 4 BMP emojis: ⛔􀀀⛲✅ 5 non-BMP characters: 􀀀􀀀􀀀􀀀􀀀 3 non-BMP emojis: 🌛🍅😀

blikblum commented 3 years ago
* The program needs to care about switching font according to different glyph coverage. I'd hope for some automatic font choice/fallback mechanism to cover all characters as needed.

There's a feature request: https://github.com/foliojs/pdfkit/issues/201

mined commented 3 years ago

The "Hællœ" pasted as "Hælloe" issue seems to depend on the PDF viewer, so forget about this one here. However, transparent pasting of all characters, whether displayable or not, is essential for certain applications.

devongovett commented 3 years ago

This sounds like a pdfkit problem not a fontkit one

mintty commented 3 years ago

The fontkit issue was closed, so we're back here... Problem: Glyphs not available in the font are neither displayed (▯) nor can they be copied and pasted back transparently (which is however an important feature in certain applications). The generated PDF contains the following in the affected cases:

[<000d00160017001100050000> 0] TJ The last character output is 0000

1 beginbfrange
<0000> <0006> [<0000> <26d4> <26f2> <2705> <d83c df1b> <d83c df45> <d83d de00>]
endbfrange

The 0000 is mapped to <0000> for copy/paste.

Pomax commented 3 years ago

Okay but that's still a pdfkit issue, no? Fontkit has nothing to do with whether or not you can copy text, or how it's presented. It just shapes unicode sequence. If the error is "fontkit isn't rendering .notdef for unknown glyphs" then that's a good issue for here, but otherwise this has nothing to do with fontkit itself?

mintty commented 3 years ago

Actually my comment should have gone to the pdfkit issue, sorry. Fixed that.