Kozea / WeasyPrint

The awesome document factory
https://weasyprint.org
BSD 3-Clause "New" or "Revised" License
7.16k stars 680 forks source link

content gets sized zero on page break #659

Closed JohannesMunk closed 6 years ago

JohannesMunk commented 6 years ago

Hello there!

First of all: thanks a lot for this great project and all the efforts you guys are putting into it. We have been using WeasyPrint for about half a year now and thus far we found workarounds for most of the issues we ran into. But this one proves harder to circumvent, because it only happens in specific page break situations.

I have boiled this down to a minimal reproducible scenario:

<html>
    <body>
        <div style="height:59em; border: 2px solid blue;">
            A (before missing content)</div>
        <div style="column-count:2; border: 2px solid red;">
            <h3>B (missing content)</h3>
            <div style="height:10em; border: 2px solid red;">1</div>
            <div style="height:10em; border: 2px solid red;">2</div>
        </div>
        <div style="height:10em; border: 2px solid green;">
            C (after missing content)</div>
    </body>
</html>

missingContent_notWorking.html.pdf The second div gets sized to zero height in the output, unless the first div has a height of less than 59em.

<html>
    <body>
        <div style="height:30em; border: 2px solid blue;">
            A (before missing content)</div>
        <div style="column-count:2; border: 2px solid red;">
            <h3>B (missing content)</h3>
            <div style="height:10em; border: 2px solid red;">1</div>
            <div style="height:10em; border: 2px solid red;">2</div>
        </div>
        <div style="height:10em; border: 2px solid green;">
            C (after missing content)</div>
    </body>
</html>

missingContent_working.html.pdf

This is also related to the column-count attribute of the second div even with column-count=1. When removed the content shows up on the next page as expected.

Can I do anything further to help you fix this? My python is very basic, but let me know if I can assist in any way.

Side note: The size calculation of the second div seems to ignore the margins of the h3 element. Thus the overlapping content in the working situation. Shall I open another issue for this?

Thx a lot in advance!

Johannes

Tontyna commented 6 years ago

Failed to debug and catch the root of evil -- the layour-code is pain for my brain.

There ist definitely something wrong with styles having a height exceeding max_position_y and the column_layout() is buggy, too. Somehow a column_layouted box, reaching the bottom margin, never makes it onto the next page, instead its children are discarded and it receives a height of 0. Really strange.

The overlapping in the working case is probably a side-effect of the faulty handling of exeeding height.

JohannesMunk commented 6 years ago

Thanks for trying, though. My observation is that the height of div1 itself does not have to exceed max_position_y, but div1 has to be only so tall, that div2 does not fit on the same page anymore. Thus failing when: div1.height + div2.height > max_position_y. missingContent_notReallyWorking_58em.html.pdf missingContent_justWorking_50em.html.pdf

What also seems strange to me is, that once div2 is the first element on a page (div1 has a page-break-after), then div2 is able to span multiple pages, albeit disregarding some element heights...

<html>
    <body>
        <div style="height:30em; border: 2px solid blue; page-break-after: always;">
            A (before missing content)</div>
        <div style="column-count:2; border: 2px solid red;">
            <h3>B (missing content)</h3>
            <div style="height:40em; border: 2px solid red;">1</div>
            <div style="height:40em; border: 2px solid red;">2</div>
            <div style="height:40em; border: 2px solid red;">3</div>
            <div style="height:40em; border: 2px solid red;">4</div>
            <div style="height:40em; border: 2px solid red;">5</div>
            <div style="height:40em; border: 2px solid red;">6</div>
        </div>
        <div style="height:10em; border: 2px solid green;">
            C (after missing content)</div>
    </body>
</html>

missingContent_working_withBreak.html.pdf

How can we make the layout-code less of a pain?

Tontyna commented 6 years ago

Same strange observations here. Thats why I think there are two bugs, one related to fixed heights and another one related to column_layout.

I assume you had a look at the source? Maybe it's my limited mental capacity, but I'm unable to assimilate this proliferated layout-code. I suspect, at the moment @liZe is the only one who fully grasps it.

@liZe : Any hints available?

Tontyna commented 6 years ago

How can we make the layout-code less of a pain?

A first step (at least for me) would be excessive comments in the source. Each time I wanna explore an issue in the code I'm painstakingly recollecting what those functions are doing.

liZe commented 6 years ago

@liZe : Any hints available?

I have to check that, I'm currently working on something different (CairoCFFI (for WeasyPrint too!)) but I'll take a look on open tickets and pull requests ASAP.

I assume you had a look at the source? Maybe it's my limited mental capacity, but I'm unable to assimilate this proliferated layout-code. I suspect, at the moment @liZe is the only one who fully grasps it.

Each time I read the code, it's really hard for me to understand the layout even I wrote big parts of this code.

A first step (at least for me) would be excessive comments in the source. Each time I wanna explore an issue in the code I'm painstakingly recollecting what those functions are doing.

Sure.

It would also be great to improve the documentation in order to share the way layout globally works without forcing people to read W3C specs.

Tontyna commented 6 years ago

I'm currently working on something different (CairoCFFI (for WeasyPrint too!)) but I'll take a look on open tickets and pull requests ASAP.

Ah, good to know -- I almost started becoming depressed and demotivated... while you're at it: please consinder to revise cairocffi.surfaces._encode_filename, see #587, the patch for unicode filenames on Windows.

Concerning comments in source code: I'd volunteer to do that. But. Not sure. Whether I'm predestinated to do that in a conforming way -- my impression until now was that WeasyPrint's master rule for comments is: "Avoid them!". I'm not talking about docstrings only, to fathom the layout-part I'd need comments all over the place, enhanced with flowcharts :smirk:

JohannesMunk commented 6 years ago

While trying to boil down another take on the endless loop.. i found a variation on this issue:

<html lang="de">
  <head>
    <meta charset="utf-8" />
    <style>
        body {
            margin: 1px;
            font-size: 9pt !important;
            font-family: DejaVu Sans, sans-serif;
        }
        .options {
            column-count: 2;
            column-gap: 3.5em;
            margin-left: 1cm;
            margin-right: 2cm;
        }
        @media print {
            @page {
                size: A4;
                margin: 0;
            }
        }
    </style>
  </head>
  <body>
    <h1>working content</h1>
    <div class="options" style="border: 2px solid red;">
      <h2>missing content</h2>
      b
    </div>
  </body>
</html>

notWorking2_de.pdf_b_5_3_1_b_4_5_a_2_1_l.html.pdf

The content gets shown as soon as I put some content before <h2>.

notWorking2_de.pdf_b_5_3_1_b_4_5_a_2_1_m.html.pdf

liZe commented 6 years ago

please consinder to revise cairocffi.surfaces._encode_filename, see #587, the patch for unicode filenames on Windows.

@Tontyna Thanks for pointing this out, I've committed b5c1840 (slightly adapted from your patch).

Concerning comments in source code: I'd volunteer to do that. But. Not sure. Whether I'm predestinated to do that in a conforming way -- my impression until now was that WeasyPrint's master rule for comments is: "Avoid them!". I'm not talking about docstrings only, to fathom the layout-part I'd need comments all over the place, enhanced with flowcharts

Flowcharts are not far from what I had in mind when I was talking about improving the documentation. There are so many corner cases in the spec that it's impossible to get a really accurate flow, but it may be great to have a wrong-but-useful chart helping users to get the big picture.

(For example, there's an amazing article explaining how CSS engines work, I'd love to have something similar but deeper for the layout part.)

liZe commented 6 years ago

Side note: The size calculation of the second div seems to ignore the margins of the h3 element. Thus the overlapping content in the working situation. Shall I open another issue for this?

@JohannesMunk It's definitely caused by the h3's margin. I'll check that and tell you if another issue is needed.

liZe commented 6 years ago

The main problem should be fixed by 9e80926, but I have to check ignored margins and heights. If I can't find an easy solution, I'll open new issues.

liZe commented 6 years ago

Oh, and we need more tests :heart_eyes:.

Tontyna commented 6 years ago

OMG pytest already runs forever!

BTW: Have no idea how all possible critical conditions, sometimes depending on special font and box sizes, could be tested. Not mentioning style combinations having unexpected weird side-effects. Not mentioning weird combinations of OSses and libraries...

(For example, there's an amazing article explaining how CSS engines work, I'd love to have something similar but deeper for the layout part.)

The article is great. And I like the Layout chapter in the documentaion, too. Though it's not a big help when one tries to solve an issue. The story of Layout -- should be written in RST, I guess?

liZe commented 6 years ago

Have no idea how all possible critical conditions, sometimes depending on special font and box sizes, could be tested. Not mentioning style combinations having unexpected weird side-effects. Not mentioning weird combinations of OSses and libraries...

My pattern for tests is:

We currently have 87% of coverage (with an awful 1% for flex, see #601). I'm trying to find minimal cases when possible, because it's easier to reproduce, to debug and to include in tests. And we have tests launched on Linux for each supported version of Python with quite old libraries, plus tests launched on macOS with the very recent versions of various libraries (thanks homebrew). Unfortunately, Windows's not supported yet by Travis, but that would be a really good thing.

(Oh, long time I didn't try try PyPy and PyPy3.)

There's also a platform with the W3C tests, it's a good way to see WeasyPrint's improvements and to find regressions or crashes.

About fonts, there's a font specially designed for tests called Ahem. It's really useful to create visual tests that can be reproduced, and avoiding most of the hinting and kerning problems.

liZe commented 6 years ago

@JohannesMunk Original examples have been fixed. The main problems were:

  1. Margins were collapsing because columns were not creating stacking contexts, causing the overlapping problem.
  2. The height balancing algorithm was broken when the first child of a column was higher than the ideal/minimum column height (total height of children ÷ number of columns).

There are some bugs left, including

<div style="height:30em; border: 2px solid blue; page-break-after: always;">
  A (before missing content)</div>
<div style="column-count:2; border: 2px solid red;">
  <h3>B (missing content)</h3>
  <div style="height:40em; border: 2px solid red;">1</div>
  <div style="height:40em; border: 2px solid red;">2</div>
  <div style="height:40em; border: 2px solid red;">3</div>
  <div style="height:40em; border: 2px solid red;">4</div>
  <div style="height:40em; border: 2px solid red;">5</div>
  <div style="height:40em; border: 2px solid red;">6</div>
</div>
<div style="height:10em; border: 2px solid green;">
  C (after missing content)</div>

and

<div style="height:19em; border: 2px solid blue;">
  A (before missing content)</div>
<div style="column-count:2; border: 2px solid red;">
  <h3 style="background: yellow;">B (missing content)</h3>
  <div style="font-size:10px; border: 2px solid red;">1</div>
  <div style="font-size:10px; border: 2px solid red;">2</div>
</div>

that's why I keep this issue open (and for tests).

The story of Layout -- should be written in RST, I guess?

@Tontyna of course!

liZe commented 6 years ago

The first example of the previous comment is fixed by efd139b. The second one is caused by break-* values being ignored during the balance and but not during the real layout (see #489).

I'm adding tests and closing this issue after.