dompdf / dompdf

HTML to PDF converter for PHP
https://dompdf.github.io/
GNU Lesser General Public License v2.1
10.49k stars 1.79k forks source link

Issue with line breaks in Chinese text. #3195

Open singleseeker opened 1 year ago

singleseeker commented 1 year ago

Here's the HTML code to generate the PDF.

<!DOCTYPE html>
<html>

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>发刊词|什么驱动着科技的发展?</title>
    <style>
        @font-face {
            font-family: 'SimSun';
            font-style: normal;
            font-weight: normal;
            src: url('/var/www/html/storage/fonts/STsong.ttf') format('truetype');
        }

        body {
            font-family: 'SimSun', sans-serif;
            font-size: 16px;
            line-height: 2;
        }

        p {
          text-indent: 2em;
        }
        h1,
        h2,
        h3,
        h4,
        h5,
        h6 {
            font-family: 'SimSun', sans-serif;
            font-weight: normal;
        }
    </style>
</head>

<body>
    <h1 style="text-align: center;  ">中文发刊词|什么驱动着科技的发展?</h1>
    <p>我是卓克。欢迎来到《科技参考》第二季的入口。</p><p>老用户都知道,咱们专栏的定位是:展现科技世界值得关注的新变化,带你分清机会与陷阱、风口与泡沫。</p><p>对于每一个重大的科技变化,我都会从三个角度为你拆解:首先,补背景,呈现一般人看不到的行业和技术背景;其次,撇泡沫,撇开商家故意制造的噱头和幌子;最后,做还原,从科技发展的底层逻辑分析它的演化趋势。从而让你跟全球科技变化保持同步,抓住每一个科技机会。</p><p>后台留言中,经常有人反馈,专栏帮他看清了科技发展的脉络;培养了科学思维,少交了智商税;甚至还有一些金融操盘手说,这些内容对他的投资决策很有帮助。</p>
</body>

</html>

I'm hoping to achieve this desired result.

image

Unfortunately, what I ended up with is the following result. The last two paragraphs don't display the correct text indentation or spaces.

image
bsweeney commented 1 year ago

The sample you posted does not have any Dompdf-recognized word breaks (e.g., space characters). This is, unfortunately, the means by which Dompdf determines where to wrap the text. The library does not yet have the necessary logic for determining where to break lines in CJK text.

You can tell Dompdf to allow line breaks anywhere in the text using the following styling: overflow-wrap: anywhere;. That'll be, mostly, OK, though obviously without any of the necessary logic one of the line breaking rules may be broken.

As for the missing text indentation, it looks like the lack of breakable text has revealed a bug in how Dompdf handles indents. It looks like Dompdf ignores the indent (or overrides it) when determining placement of a string of non-breakable text. This is true even if you've told Dompdf it can break the text anywhere.

You can see the indentation issue with Latin text as well:

<!DOCTYPE html>
<html>

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <style>
    p {
      text-indent: 2em;
    }
  </style>
</head>

<body>
    <p>loremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsumloremipsum</p>
    <p>lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum</p>
</body>

</html>
singleseeker commented 1 year ago

Wow, thank you for your prompt response. I hope this issue can be fixed in the future version.