accordproject / markdown-transform

Parse and transform markdown text, including TemplateMark markdown templates
Apache License 2.0
72 stars 50 forks source link

Unicode fonts for pdf transform #455

Open jeromesimeon opened 3 years ago

jeromesimeon commented 3 years ago

Discussion 🗣

The transformation stack seems to handle unicode quite well, but in the pdf generation case not all character sets are handled.

Detailed Description

For the following markdown file:

Lorem Ipsum е елементарен примерен текст, използван в печатарската и типографската индустрия. Lorem Ipsum е индустриален стандарт от около 1500 година, когато неизвестен печатар взема няколко печатарски букви и ги разбърква, за да напечата с тях книга с примерни шрифтове. Този начин не само е оцелял повече от 5 века, но е навлязъл и в публикуването на електронни издания като е запазен почти без промяна. Популяризиран е през 60те години на 20ти век със издаването на Letraset листи, съдържащи Lorem Ipsum пасажи, популярен е и в наши дни във софтуер за печатни издания като Aldus PageMaker, който включва различни версии на Lorem Ipsum.

Lorem Ipsum छपाई और अक्षर योजन उद्योग का एक साधारण डमी पाठ है. Lorem Ipsum सन १५०० के बाद से अभी तक इस उद्योग का मानक डमी पाठ मन गया, जब एक अज्ञात मुद्रक ने नमूना लेकर एक नमूना किताब बनाई. यह न केवल पाँच सदियों से जीवित रहा बल्कि इसने इलेक्ट्रॉनिक मीडिया में छलांग लगाने के बाद भी मूलतः अपरिवर्तित रहा. यह 1960 के दशक में Letraset Lorem Ipsum अंश युक्त पत्र के रिलीज के साथ लोकप्रिय हुआ, और हाल ही में Aldus PageMaker Lorem Ipsum के संस्करणों सहित तरह डेस्कटॉप प्रकाशन सॉफ्टवेयर के साथ अधिक प्रचलित हुआ.

Lorem Ipsum,也称乱数假文或者哑元文本, 是印刷及排版领域所常用的虚拟文字。由于曾经一台匿名的打印机刻意打乱了一盒印刷字体从而造出一本字体样品书,Lorem Ipsum从西元15世纪起就被作为此领域的标准文本使用。它不仅延续了五个世纪,还通过了电子排版的挑战,其雏形却依然保存至今。在1960年代,”Leatraset”公司发布了印刷着Lorem Ipsum段落的纸张,从而广泛普及了它的使用。最近,计算机桌面出版软件”Aldus PageMaker”也通过同样的方式使Lorem Ipsum落入大众的视野。

Le Lorem Ipsum est simplement du faux texte employé dans la composition et la mise en page avant impression. Le Lorem Ipsum est le faux texte standard de l'imprimerie depuis les années 1500, quand un imprimeur anonyme assembla ensemble des morceaux de texte pour réaliser un livre spécimen de polices de texte. Il n'a pas fait que survivre cinq siècles, mais s'est aussi adapté à la bureautique informatique, sans que son contenu n'en soit modifié. Il a été popularisé dans les années 1960 grâce à la vente de feuilles Letraset contenant des passages du Lorem Ipsum, et, plus récemment, par son inclusion dans des applications de mise en page de texte, comme Aldus PageMaker.

The transforms seem to mostly work. For instance here is the HTML version:

document.txt

which looks like this in Chrome:

Screen Shot 2021-08-18 at 9 15 00 PM

But the corresponding pdf lacks some of the character sets

document.pdf

mttrbrts commented 3 years ago

Adding lots of character-sets can really bloat the module. We should solve this in a way that allows users to choose their own character-sets, if the default are not-sufficient.

jeromesimeon commented 3 years ago

Adding lots of character-sets can really bloat the module. We should solve this in a way that allows users to choose their own character-sets, if the default are not-sufficient.

Agreed. I am still looking at font support in pdfmake and pdfkit and how it's been setup in the markdown transform service and API. Maybe we need to do nothing and it's already in place, but I'll leave this open a little longer until I can figure out how to set up fonts.

subhajit20 commented 8 months ago

hey @mttrbrts @jeromesimeon can you briefly explain what the problem is here?

barkhaaroraa commented 7 months ago

could you assign this issue to me @mttrbrts @jeromesimeon