allcolor / YaHP-Converter

YaHP is a Java library that allows you to convert an HTML document into a PDF document.
GNU Lesser General Public License v2.1
56 stars 23 forks source link

Missing Chinese or Japanese in the pdf #36

Open milkdeliver opened 9 years ago

milkdeliver commented 9 years ago

There were missing characters when I converted a html which contained some Chinese and Japanese characters into pdf. Chinese and Japanese characters can not show in pdf.Could you give me hint?

Corrected page image

PDF with missing words image

Here is my call function codes.

    public static void main(String[] args) throws Exception {
          String root = "C:/Project/Project-Part-1-Solution";
          String input = "index.html";
          htmlToPdfFile (new File(root, input), new File(root, "index" + ".pdf"));
          System.out.println("Done");
    }
protected static void htmlToPdfFile(File htmlIn, File pdfOut) throws Exception {
        // TODO Auto-generated method stub
        Scanner scanner = new Scanner(htmlIn).useDelimiter("\\Z");
        String htmlContents = scanner.next();

        CYaHPConverter converter = new CYaHPConverter();
        FileOutputStream out = new FileOutputStream(pdfOut);
        Map properties = new HashMap();
        List headerFooterList = new ArrayList();

            properties.put(IHtmlToPdfTransformer.PDF_RENDERER_CLASS,    IHtmlToPdfTransformer.FLYINGSAUCER_PDF_RENDERER);
        converter.convertToPdf(
        htmlContents,
        IHtmlToPdfTransformer.A4P,
        headerFooterList,
        "file:///Project/Project-Part-1-Solution/",
        out,
        properties);

        out.flush();
        out.close();
    }
allcolor commented 9 years ago

Hi, for npn latin characters, you must embed a ttf font containing the glyphs you want, like ms arial unicode for example.

Regards Le 15 juil. 2015 16:39, "Sam" notifications@github.com a écrit :

There were missing characters when I converted a html which contained some Chinese and Japanese characters into pdf. Chinese and Japanese characters can not show in pdf.Could you give me hint?

Corrected page [image: image] https://cloud.githubusercontent.com/assets/3108407/8699589/de87d0e2-2b06-11e5-95df-6b870e80c350.png

PDF with missing words [image: image] https://cloud.githubusercontent.com/assets/3108407/8699572/afee1eda-2b06-11e5-880e-e37515a33971.png

Here is my call function codes.

public static void main(String[] args) throws Exception {
      String root = "C:/Project/Project-Part-1-Solution";
      String input = "index.html";
      htmlToPdfFile (new File(root, input), new File(root, "index" + ".pdf"));
      System.out.println("Done");
}protected static void htmlToPdfFile(File htmlIn, File pdfOut) throws Exception {
    // TODO Auto-generated method stub
    Scanner scanner = new Scanner(htmlIn).useDelimiter("\\Z");
    String htmlContents = scanner.next();

    CYaHPConverter converter = new CYaHPConverter();
    FileOutputStream out = new FileOutputStream(pdfOut);
    Map properties = new HashMap();
    List headerFooterList = new ArrayList();

        properties.put(IHtmlToPdfTransformer.PDF_RENDERER_CLASS,    IHtmlToPdfTransformer.FLYINGSAUCER_PDF_RENDERER);
    converter.convertToPdf(
    htmlContents,
    IHtmlToPdfTransformer.A4P,
    headerFooterList,
    "file:///Project/Project-Part-1-Solution/",
    out,
    properties);

    out.flush();
    out.close();
}

— Reply to this email directly or view it on GitHub https://github.com/allcolor/YaHP-Converter/issues/36.

milkdeliver commented 9 years ago

Included arial unicode ttf and it works.Perfect allcolor.

java

properties.put(IHtmlToPdfTransformer.FOP_TTF_FONT_PATH, "C:/Users/Desktop/work files/Fonts/");

css

@font-face
{
    font-family: Arial Unicode MS;
    src: url(Fonts/arialuni.ttf) format("truetype");
}
hamalaja commented 7 years ago

@milkdeliver can you show your html file where you use font-face css. I have same problem, but I dont know how to using font-face css

oct24th commented 5 years ago

I do not know if it's the same problem as me, but I had a similar problem. The reason is that the included font name of the font family included in html was written in Korean. The problem was caused by changing the font name to "Unicode" when "jtidy" cleans up html. I solved the problem by not including Korean alphabet in the font name.