ArthurHub / HTML-Renderer

Cross framework (WinForms/WPF/PDF/Metro/Mono/etc.), Multipurpose (UI Controls / Image generation / PDF generation / etc.), 100% managed (C#), High performance HTML Rendering library.
https://htmlrenderer.codeplex.com/
BSD 3-Clause "New" or "Revised" License
1.24k stars 522 forks source link

Why cannot support Chinese when convert html to PDF? 支持中文? #99

Open away888 opened 6 years ago

away888 commented 6 years ago

Can you tell me how to support Chinese ?

` var htmlStr = "

Support Chinese,支持中文 PDF.

";

PdfDocument pdf = PdfGenerator.GeneratePdf(htmlStr, PageSize.A4); pdf.save("zh.pdf"); `

superboy1984 commented 6 years ago
    public FileResult ToPdfChinese(string Words)
    {
        if (string.IsNullOrWhiteSpace(Words))
        {
            Words = "你好,世界!";
        }
        //
        Byte[] res = null;
        using (MemoryStream ms = new MemoryStream())
        {
            // Create a new PDF document
            PdfDocument document = new PdfDocument();
            document.Info.Title = "PDFSHARP测试";

            // Create an empty page
            PdfPage page = document.AddPage();

            // Get an XGraphics object for drawing
            XGraphics gfx = XGraphics.FromPdfPage(page);

            //XPdfFontOptions options = new XPdfFontOptions(PdfFontEncoding.Unicode, PdfFontEmbedding.Always);

            System.Drawing.Text.PrivateFontCollection pfcFonts = new System.Drawing.Text.PrivateFontCollection();
            string strFontPath = @"C:/Windows/Fonts/msyh.ttf";//字体设置为微软雅黑
            pfcFonts.AddFontFile(strFontPath);

            XPdfFontOptions options = new XPdfFontOptions(PdfFontEncoding.Unicode, PdfFontEmbedding.Always);
            XFont font = new XFont(pfcFonts.Families[0], 15, XFontStyle.Regular, options);

            // Create a font
            //XFont font = new XFont("Times New Roman", 20, XFontStyle.BoldItalic);

            // Draw the text
            gfx.DrawString(Words, font, XBrushes.Black,
              new XRect(0, 0, page.Width, page.Height),
              XStringFormats.Center);

            // Save the document...
            document.Save(ms);
            res = ms.ToArray();
        }
        return File(res, "application/pdf", "helloworld.pdf");
    }
rtalcSharpProgrammer commented 3 years ago

Hi,

maybe it's a little bit late for @away888 but I had the same problem 3 weeks ago and I hope with my solution some other people can save a lot of time.

My task was to generate a pdf from html. For generating this pdf I used HTML-Renderer from Arthurhub. The goal was to generate reports and reports must be translatable in many different languages. Everything seemed fine and I could generate pdfs. But then there was the problem: When there was a html-file with chinese characters (or japanese or something else) the html-file showed the chinese characters correctly but after converting them into pdf there were no chinese characters there only were rectangles.

I tried to solve the problem and I found this site. But the problem of @superboy1984 's solution is that you draw the signs and that you can't use the html-renderer-tool as it was intended (give html-site as input - with chinese characters - and get pdf-site as output). So I googled a lot and found the perfect solution for me:

Summarized is it the following:

That's it! So because it's better said than done I will share my code:

First you need to download the font Noto Serif SC (this is the link: https://fonts.google.com/specimen/Noto+Serif+SC ). You will get a .zip-file with variants of Not Serif SC fonts. They have the data ending ".otf". This is very important because pdfsharp has some problems (not with every file but with some files there will be an exception) with fonts which are described in a file with the data ending ".ttf". In this sample I will not install the font manually. I will install it programmatically (deployment things). If you want to install it manually you can do it.

class YourClass {

    //Import dll's for Add/Remove fonts while runtime so you don't have to install it manually
    [DllImport("gdi32.dll", EntryPoint = "AddFontResourceW", SetLastError = true)]
    public static extern int AddFontResource([In][MarshalAs(UnmanagedType.LPWStr)] string lpFileName);
    [DllImport("gdi32.dll", EntryPoint = "RemoveFontResourceW", SetLastError = true)]
    public static extern int RemoveFontResource([In][MarshalAs(UnmanagedType.LPWStr)] string lpFileName);

    void Main(string[] args)
    {
        CssData cssData;
        string cssContent;
        string contentOfTextData;
        string ttf;
        PrivateFontCollection _FontCollection;
        PdfDocument pdf;

        pdf = new PdfDocument();

        //create font collection
        _FontCollection = new System.Drawing.Text.PrivateFontCollection();
        //path to your .otf-file
        ttf = @"F:\Noto_Serif_SC\NotoSerifSC-ExtraLight.otf";
        //add the font
        AddFontResource(@"F:\Noto_Serif_SC\NotoSerifSC-ExtraLight.otf");
        _FontCollection.AddFontFile(ttf);
       //your css must determine that your html uses Noto Serif Sc Extralight as font
        cssContent = File.ReadAllText(@"E:\HtmlToPdfProject\LanguageProblem\test.css");
        contentOfTextData = File.ReadAllText(@"E:\HtmlToPdfProject\LanguageProblem\TempReport.html");
        cssData = PdfGenerator.ParseStyleSheet(cssContent);

        PdfGenerateConfig config = new PdfGenerateConfig()
        {
            MarginBottom = 70,
            MarginLeft = 20,
            MarginRight = 20,
            MarginTop = 20,

        };

        config.PageSize = PageSize.A4;
        pdf = PdfGenerator.GeneratePdf(contentOfTextData, config, cssData);
        pdf.Save(@"E:\HtmlToPdfProject\LanguageProblem\document11.pdf");
        RemoveFontResource(@"F:\Noto_Serif_SC\NotoSerifSC-ExtraLight.otf");
    }

}

Your pdf now can show chinese (or what ever you want) characters. The font is embedded in the pdf document. But some pdf viewer for example adobe acrobate can't show the characters (I don't know why). But pdf24 Reader can show the pdf correctly and Microsoft Edge can show it correctly (Mozilla firefox can't show it correctly for example).

I hope I could help someone...

iemsoft commented 2 years ago

Use class style like below. I have used this way to resolve chinese issue. 微信图片_20211209094101

wanglong commented 1 year ago

GitHub - j-petty/HtmlRendererCore: HtmlRendererCore is a partial port of HtmlRenderer for .NET Core.

modify code methods: ///

/// Check if the given char is of Asian range. /// /// the character to check /// true - Asian char, false - otherwise public static bool IsAsianCharecter(char ch) { //return ch >= 0x4e00 && ch <= 0xFA2D; // not enough for Japanese

        return ('\u4E00' <= ch && ch <= '\u9FCF')   // CJK統合漢字  CJK Unified Ideographs
            || ('\uF900' <= ch && ch <= '\uFAFF')   // CJK互換漢字  CJK Compatibility Ideographs
            || ('\u3400' <= ch && ch <= '\u4DBF')   // CJK統合漢字拡張A  CJK Unified Ideographs Extension A
            || ('\u3041' <= ch && ch <= '\u309F')   // ひらがな  Hiragana
            || ('\u30A0' <= ch && ch <= '\u30FF')   // 全角カタカナ  Zenkaku Katakana
            || ('\u31F0' <= ch && ch <= '\u31FF')   // 濁点と半濁点  Voice Sound Mark & Semi-Voice Sound Mark
            || ('\u3099' <= ch && ch <= '\u309C')   // 中点と長音記号  Dot & Prolonged Sound Mark
            || ('\u3000' <= ch && ch <= '\u303f')   // 句読点  CJK Symbols and Punctuation
            ;

    }

///

/// Default font used for the generic 'serif' family /// public const string DefaultFont = "STZhongsong";//"Segoe UI";

`static void Main(string[] args) { // 加载 HTML 内容并将其渲染到 PDF 文档中 string htmlContent = "<!DOCTYPE html><html lang=\"en-US\"><meta charset=\"utf-8\"/><meta name=\"viewport\"content=\"width=device-width\"/>Web Font SampleThis is Bitstream Vera Serif Bold.测试 CWT酒店 PDF 中文 五一快乐!Happy May Day! Happy Labor Day!ハッピーメーデー!해피 메이 데이!";

        // Act
        using (var stream = new MemoryStream())
        {
            var pdf = PdfGenerator.GeneratePdf(htmlContent, PdfSharpCore.PageSize.A4);

            pdf.Save(stream);

            var pdfBytes = stream.ToArray();
            // 保存 PDF 文档到文件
            // 将 PDF 字节数组保存为文件
            string outputPath = "output.pdf";
            File.WriteAllBytes(outputPath, pdfBytes);
        }
    }
}

image