LibrePDF / OpenPDF

OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.
Other
3.49k stars 581 forks source link

Japanese symbols are not rendered into PDF file #1196

Closed a5a351e7 closed 1 month ago

a5a351e7 commented 1 month ago

Describe the bug

If Japanese symbols are used, these are not displayed in a generated PDF file.

To Reproduce

Code to reproduce the issue

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfName;
import com.lowagie.text.pdf.PdfString;
import com.lowagie.text.pdf.PdfWriter;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main( String[] args ) {
        System.out.println( "Hello World!" );

        // step 1: creation of a document-object
        Document document = new Document();
        try {
            // step 2:
            // we create a writer that listens to the document
            // and directs a PDF-stream to a file
            final PdfWriter instance = PdfWriter.getInstance(document, Files.newOutputStream(Paths.get("HelloWorld.pdf")));

            // step 3: we open the document
            document.open();
            instance.getInfo().put(PdfName.CREATOR, new PdfString(Document.getVersion()));

            // step 4: we add a paragraph to the document
            document.add(new Paragraph("Hello World"));
            document.add(new Paragraph("START こんにちは、これはテストです END"));
            document.add(new Paragraph("Hello World"));
        } catch (DocumentException | IOException de) {
            System.err.println(de.getMessage());
        }

        // step 5: we close the document
        document.close();
    }
}

Expected behavior

I expected the rendering of the Japanese symbols written in the PDF file.

Screenshots

Screenshot at 2024-07-26 14-43-22

System

(please complete the following information)

Your real name

Florian aka GitHub user a5a351e7

Additional context

I am not quire sure, if this problem is related to #946

Lonzak commented 1 month ago

Did you include the itext-asian.jar in your classpath?

<dependency>
    <groupId>com.lowagie</groupId>
    <artifactId>itextasian</artifactId>
    <version>1.5.2</version>
</dependency>
a5a351e7 commented 1 month ago

Hi @Lonzak ,

unfortunately, this dependency is not available via Maven Central (https://central.sonatype.com/search?q=itextasian) and OpenMind (via https://mvnrepository.com/artifact/com.lowagie/itextasian/1.5.2 ) apparently no longer offers the file for download.

Which repository are you using?

Lonzak commented 1 month ago

Yeah you are right eventhough there are still listed it seems they have been removed: https://mvnrepository.com/artifact/com.lowagie/itextasian

A quick google search turned up this repo. So I would download it there and first try whether this fixes your issue...

Update: In the end you can also use a newer one because inside the file it says:

These specific metrics files were created by Paulo Soares and may be used, copied, and distributed for any purpose and without charge, with or without modification.

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext-asian</artifactId>
    <version>5.2.0</version>
</dependency>
StevenStreasick commented 1 month ago

You can get these symbols to render. However, I was only able to get it to render under two conditions in my test.

To fix your demo, try something like this

FontFactory.register("PATHTOMEIRYO\\Meiryo.ttf", "meiryo");
Font japaneseFont = FontFactory.getFont("meiryo", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, 12);

Paragraph bilingualParagraph = new Paragraph();
Chunk start = new Chunk("START ");
Chunk japanseText = new Chunk("こんにちは、これはテストです", japaneseFont);
Chunk end = new Chunk(" END");
bilingualParagraph.add(start);
bilingualParagraph.add(japanseText);
bilingualParagraph.add(end);

doc.add(bilingualParagraph);

This example code is based off of the sample code provided here

a5a351e7 commented 1 month ago

@Lonzak , Thank you! The repository with the IP address seems somewhat ominous and no longer appropriate, especially in view of the supply chain attacks. Using the new itext-asian library did not work for me either. The characters are not output either.

@StevenStreasick , Thank you! Your suggestion and the code example work.

In general it seems to be necessary to use a Unicode font instead of the standard Helvetica font, see #363. IMHO: This feature should be included more prominently in the documentation.

I am now looking for a beautiful and usable Unicode font. Perhaps Helvetica can also be replaced as the standard font with a Unicode font.

vk-github18 commented 1 month ago

See the free OpenType fonts at https://fonts.google.com/noto To support different languages you need more than one font.

a5a351e7 commented 1 month ago

@vk-github18 Thank you! Noto looks great to me.

In @StevenStreasick example, I have to manually specify the font for the particular language in the chunks or paragraphs. Is it possible to make a number of fonts (e.g. all Noto fonts) available as usable in OpenPDF and OpenPDF itself decides when which font must be used?

My problem is that I don't know at this point in the code which characters will be used. What is the best practice here?

EDIT: actually a bit of this topic is documented: https://github.com/LibrePDF/OpenPDF/wiki/Multi-byte-character-language-support-with-TTF-fonts

Lonzak commented 1 month ago

Using the new itext-asian library did not work for me either.

Ok my bad - I thought you know what the itext-asian.jar is for. Those cmaps are utilized internally by openPDF when rendering Asian characters, ensuring that characters are correctly interpreted and displayed.

The .cmap and .properties files in this jar are necessary to produce PDF files with iText that use CJK fonts.

However you still need the actual font. But for this there are several possibilities.

Is it possible to make a number of fonts (e.g. all Noto fonts) available as usable in OpenPDF and OpenPDF itself decides when which font must be used?

  1. Load the font from the system (as you did with e.g. Google Noto Sans CJK, Arial Unicode MS or Adobe's Heisei Mincho font-family and Heisei Kaku Gothic or Microsoft's SimHei or SimSun. This however is platform dependent.
  2. Include such font in your project: Add the font file to your project resources. Ensure it's accessible within your project structure. You basically bring the font along with your project.
  3. Use font embedding. Use openPDF to embed the font into your PDF. This ensures the PDF will display correctly on any platform.
a5a351e7 commented 1 month ago

@Lonzak Thank you for clarification!

Fonts such as Noto have different variants, e.g. Noto Sans Korean, Noto Serif Japanese, Noto Naskh Arabic (see https://fonts.google.com/noto/fonts). Each variant is offered in its own font file.

If I have understood correctly, I can only set a font directly on text objects (paragraph, chunk) and not for the entire document. If this assumption is correct, then I ask then how I can dynamically set the correct font for the particular text object.

Sorry for my lack of detailed knowledge on the subject of fonts and their processing.

Lonzak commented 1 month ago

If I have understood correctly, I can only set a font directly on text objects (paragraph, chunk) and not for the entire document.

Yes that is correct. (I think iText starting from version 7 supports this)

If this assumption is correct, then I ask then how I can dynamically set the correct font for the particular text object.

What exactly do you mean with "dynamically set the correct font"? Do you mean the type and the style of the font itself? Or loading e.g. a japanese or korean font? Since you are the creator of the document you know which font you need and this one you'll load ...? Maybe you can describe this in more detail.

a5a351e7 commented 1 month ago

My big problem is that I have a template that is made up of a western language and any language. For example an invoice, the template is in English and the data can be any language. From English to French, to Japanese and Arabic.

At the moment I can't exactly say which language/symbols will be used. I am looking for a generally valid way to write Unicode characters (i.e. all fonts) into a PDF.

I appreciate every comment and every person who participates in this discussion. So thank you in advance!

StevenStreasick commented 1 month ago

I was able to do this using this class. This class looks like it will look for the 'best' fit font based on the stored list of fonts and returns a phrase that utilizes that font, or a combination of the best fit fonts. Behind the scenes, this class is breaking a string up into a character array, finding the best fit font for this character, and then creating a chunk for that character with that font. Then it adds this character to a new Phrase, which will get returned at the end.

Here is a quick demo that I wrote up to demonstrate this.

FontSelector selector = new FontSelector();
selector.addFont(japaneseFont);
selector.addFont(robotoFont);

PdfPCell cell = new PdfPCell(selector.process("Hello world! こんにちは"));
Lonzak commented 1 month ago

Then use a unicode font which contains "all" languages like Arial Unicode MS. Or like Steven suggested the FontSelector looks promising...

a5a351e7 commented 1 month ago

@StevenStreasick The FontSelector is great! Thank you very much for mentioning it and for the little code example!

@Lonzak That's exactly what I researched in parallel and found the following font (https://github.com/satbyy/go-noto-universal), which is 15 MB in size and would have worked without the FontSelector.

These two solutions resolved the problem for me perfectly.

I would like to thank everyone involved for their constructive and fast responses to my questions and for clarifying the way it works! 🙏