Kozea / WeasyPrint

The awesome document factory
https://weasyprint.org
BSD 3-Clause "New" or "Revised" License
7.15k stars 681 forks source link

endless loop in table layout #660

Closed JohannesMunk closed 6 years ago

JohannesMunk commented 6 years ago

Hello again!

Over the weekend I successfully boiled down another problem, that hunted us for a couple of weeks and prevents us outputting some files. As these files are pretty big, such a constellation as random it might seem, happens with a stubborn regularity.

This bug requires a specific layout situation with given widths and content. I hope it is reproducable on your side. On my side the following html does not convert, but instead weasyprint collects memory until it runs out of it, or the process is aborted..

<html lang="de">
<head>
    <meta charset="utf-8" />
    <style>
        body {
            margin: 1px;
            font-size: 9pt !important;
            font-family: Helvetica, Arial, sans-serif;
        }
        .options {
            column-count: 2;
            column-gap: 3.5em;
            margin-left: 1cm;
            margin-right: 2cm;
        }
        table {
            width: 100%;
            border-spacing: 0;
            border-collapse: collapse; 
            font-size: 0.9em !important;
        }
        table, td {
            padding: 0; 
            margin: 0;
        }
        @media print {
            @page {
                size: A4;
                margin: 0;
            }
        }
    </style>
</head>
<body>
  <div class="options">
    <table><tr>
      <td style="width:1.5em;"></td>
      <td>(<span><span>Standardeinstellung für Antriebs-/Lenk- Befehlscode - vorwärts,
        rückwärts, links, rechts</span></span>)</td>
      <td style="width: 7em;"></td>
    </tr></table>
  </div>
</body>
</html>

I let this run in cProfile:

 Ordered by: cumulative time

ncalls tottime cumtime filename:lineno(function)
   274   0.009  48.899 {built-in method builtins.exec}
     1   0.000  46.891 ..\weasyprint\__init__.py:148(write_pdf)
     1   0.000  46.891 ..\weasyprint\__init__.py:116(render)
     1   0.000  46.891 ..\weasyprint\document.py:306(_render)
     1   0.000  46.871 ..\weasyprint\document.py:334(<listcomp>)
     1   0.000  46.871 ..\weasyprint\layout\__init__.py:39(layout_document)
     1   0.000  46.871 ..\weasyprint\layout\pages.py:606(make_all_pages)
     1   0.000  46.868 ..\weasyprint\layout\pages.py:512(make_page)
     5   0.000  46.868 ..\weasyprint\layout\blocks.py:27(block_level_layout)
     5   0.000  46.868 ..\weasyprint\layout\blocks.py:80(block_box_layout)
     6   0.032  46.868 ..\weasyprint\layout\blocks.py:404(block_container_layout)
     1   0.000  46.867 ..\weasyprint\layout\blocks.py:123(columns_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:18(table_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:273(all_groups_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:241(body_groups_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:62(group_layout)
 28101   0.044  46.791 ..\weasyprint\layout\inlines.py:29(iter_line_boxes)
 28101   0.601  46.747 ..\weasyprint\layout\inlines.py:62(get_next_linebox)
112419   2.258  29.844 ..\weasyprint\text.py:910(split_first_line)
 28100   1.767  25.276 ..\weasyprint\layout\inlines.py:634(split_inline_box)
289541   0.500  24.180 {built-in method builtins.next}
 56201   0.353  22.892 ..\weasyprint\layout\inlines.py:555(split_inline_level)
 28102   0.069  17.947 ..\weasyprint\layout\preferred.py:179(inline_min_content_width)
 56214   0.878  17.648 ..\weasyprint\layout\preferred.py:220(inline_line_widths)
 56201   0.503  14.811 ..\weasyprint\layout\inlines.py:931(split_text_box)
758827  11.978  12.425 ..\weasyprint\text.py:665(iter_lines)
182684   1.038   9.204 ..\weasyprint\text.py:826(create_layout)
182685   3.935   6.636 ..\weasyprint\text.py:618(__init__)
112419   0.946   4.435 ..\weasyprint\text.py:580(first_line_metrics)
168616   1.271   4.023 ..\weasyprint\layout\percentages.py:59(resolve_percentages)
112403   0.154   3.299 ..\weasyprint\formatting_structure\boxes.py:322(copy_with_children)
...

Ordered by: call count

 ncalls tottime cumtime filename:lineno(function)
3738901   0.575   0.575 {built-in method builtins.isinstance}
3668880   0.642   0.643 {built-in method builtins.setattr}
2234161   0.568   0.568 ..\weasyprint\layout\percentages.py:15(_percentage)
2234161   1.288   2.208 ..\weasyprint\layout\percentages.py:32(resolve_one_percentage)
1447384   0.469   0.495 ..\cffi\api.py:171(_typeof)
1264699   1.089   2.350 ..\cffi\api.py:233(new)
1264699   0.687   0.687 {built-in method _cffi_backend.newp}
1060589   0.131   0.131 {built-in method builtins.len}
1029475   0.466   0.466 {method 'encode' of 'str' objects}
 815049   0.390   0.819 ..\cffi\api.py:404(gc)
 815049   0.428   0.428 {built-in method _cffi_backend.gcp}
 758828   0.343   0.343 {method 'replace' of 'bytes' objects}
 758828   0.719   2.740 ..\weasyprint\text.py:554(unicode_to_char_p)
 758827  11.978  12.425 ..\weasyprint\text.py:665(iter_lines)
 289541   0.500  24.180 {built-in method builtins.next}
 675484   0.336   0.336 {method 'format' of 'str' objects}
 477770   1.288   2.219 ..\weasyprint\text.py:564(get_size)
 451777   1.232   1.232 {built-in method __new__ of type object at 0x000000005797C430}
 449648   0.329   0.351 ..\weasyprint\formatting_structure\boxes.py:296(enumerate_skip)
 407501   0.058   0.058 {method 'append' of 'list' objects}
 365344   0.107   0.107 ..\weasyprint\formatting_structure\boxes.py:272(is_absolutely_positioned)
 337257   0.980   2.173 ..\weasyprint\text.py:674(set_text)
 309125   0.074   0.074 ..\weasyprint\formatting_structure\boxes.py:268(is_floated)
 301427   0.085   0.085 {method 'replace' of 'str' objects}
 289713   0.099   0.100 {method 'join' of 'str' objects}
 284618   0.195   0.195 {method 'decode' of 'bytes' objects}
 281001   0.083   0.083 ..\weasyprint\formatting_structure\boxes.py:136(padding_height)
 281001   0.138   0.222 ..\weasyprint\formatting_structure\boxes.py:145(border_height)
 224804   0.103   0.103 ..\weasyprint\formatting_structure\boxes.py:132(padding_width)
 224804   0.136   0.238 ..\weasyprint\formatting_structure\boxes.py:140(border_width)
 210752   0.047   0.047 ..\weasyprint\text.py:1194(<genexpr>)
 197870   0.128   0.128 {method 'split' of 'str' objects}
 196701   0.415   0.415 ..\weasyprint\css\computed_values.py:664(strut_layout)
 112400   0.155   0.171 ..\weasyprint\formatting_structure\boxes.py:115(translate)
 186049   0.258   0.403 {built-in method builtins.hasattr}
 184778   0.069   0.069 {method 'rpartition' of 'str' objects}
 183026   0.092   1.675 <frozen importlib._bootstrap>:997(_handle_fromlist)
 183378   0.146   0.215 <frozen importlib._bootstrap>:416(parent)
 182686   0.196   0.388 ..\cffi\api.py:284(cast)
 182686   0.085   0.085 {built-in method _cffi_backend.cast}
 182685   3.935   6.636 ..\weasyprint\text.py:618(__init__)
 182684   1.038   9.204 ..\weasyprint\text.py:826(create_layout)
 182683   0.523   0.523 ..\weasyprint\text.py:724(get_font_features)
 169267   0.277   0.277 {method 'update' of 'dict' objects}
  56214   0.878  17.648 ..\weasyprint\layout\preferred.py:220(inline_line_widths)
 168616   1.271   4.023 ..\weasyprint\layout\percentages.py:59(resolve_percentages)
 168609   0.071   0.071 ..\weasyprint\formatting_structure\boxes.py:160(content_box_x)
 168605   0.262   0.590 ..\weasyprint\formatting_structure\boxes.py:104(copy)
 156652   0.067   0.067 {method 'endswith' of 'str' objects}
 155313   0.025   0.025 {method 'extend' of 'list' objects}
 140501   0.091   0.243 ..\weasyprint\formatting_structure\boxes.py:150(margin_width)
  56201   0.353  22.892 ..\weasyprint\layout\inlines.py:555(split_inline_level)
 140499   0.575   1.991 ..\weasyprint\formatting_structure\boxes.py:305(_reset_spacing)
 126467   0.088   0.151 ..\weasyprint\formatting_structure\boxes.py:276(is_in_normal_flow)
 113112   0.046   0.046 {built-in method builtins.max}
 112419   0.946   4.435 ..\weasyprint\text.py:580(first_line_metrics)
 112419   2.258  29.844 ..\weasyprint\text.py:910(split_first_line)
 112410   0.096   0.255 ..\weasyprint\text.py:550(utf8_slice)
 112403   0.154   3.299 ..\weasyprint\formatting_structure\boxes.py:322(copy_with_children)
 112402   0.669   0.669 {method 'copy' of 'dict' objects}
  28101   0.157   0.188 ..\weasyprint\layout\inlines.py:179(skip_first_whitespace)
 112401   0.054   0.149 ..\weasyprint\formatting_structure\boxes.py:154(margin_height)
 112400   0.037   0.037 ..\weasyprint\layout\inlines.py:898(<listcomp>)
  28100   1.767  25.276 ..\weasyprint\layout\inlines.py:634(split_inline_box)
  98364   0.141   0.248 ..\pyphen\__init__.py:56(language_fallback)
  90402   0.042   0.042 {method 'rstrip' of 'str' objects}
  88412   0.046   0.046 {built-in method builtins.min}
  84550   0.029   0.029 {method 'strip' of 'str' objects}
  84320   0.271   0.271 ..\weasyprint\layout\preferred.py:125(margin_width)
  84318   0.150   0.223 ..\weasyprint\layout\preferred.py:110(min_max)
  84318   0.075   0.569 ..\weasyprint\layout\preferred.py:153(adjust)
  84300   0.028   0.028 ..\weasyprint\formatting_structure\boxes.py:182(border_box_y)
  84300   0.012   0.012 ..\weasyprint\formatting_structure\boxes.py:293(all_children)
  84300   0.102   2.213 ..\weasyprint\formatting_structure\boxes.py:418(_remove_decoration)
  84300   0.016   0.016 ..\weasyprint\layout\float.py:145(<listcomp>)
  84300   0.010   0.010 ..\weasyprint\layout\float.py:155(<listcomp>)
  84300   0.009   0.009 ..\weasyprint\layout\float.py:159(<listcomp>)
  84300   0.464   0.825 ..\weasyprint\layout\float.py:133(avoid_collisions)
  28100   0.272   0.518 ..\weasyprint\layout\inlines.py:1072(inline_box_verticality)
  28100   0.084   0.142 ..\weasyprint\layout\inlines.py:1229(is_phantom_linebox)
  29517   0.044   0.466 {built-in method builtins.any}
  56508   0.068   0.156 I:\x3rdParty\Python\lib\re.py:286(_compile)
  56237   0.008   0.008 {method 'reverse' of 'list' objects}
  56220   0.100   0.100 {method 'finditer' of '_sre.SRE_Pattern' objects}
  56208   0.044   0.211 I:\x3rdParty\Python\lib\re.py:224(finditer)
  56208   0.045   0.045 ..\weasyprint\text.py:1046(<listcomp>)
  56208   0.009   0.009 ..\weasyprint\text.py:1050(<listcomp>)
  56201   0.042   0.284 ..\weasyprint\formatting_structure\boxes.py:474(copy_with_text)
  56201   0.503  14.811 ..\weasyprint\layout\inlines.py:931(split_text_box)
  28100   0.018   0.439 ..\weasyprint\layout\inlines.py:1257(<genexpr>)
  14050   0.078   0.482 ..\weasyprint\layout\inlines.py:1245(can_break_inside)
  31266   0.006   0.006 {method 'startswith' of 'str' objects}
  28103   0.036   0.585 ..\weasyprint\formatting_structure\boxes.py:314(_remove_decoration)
  28102   0.069  17.947 ..\weasyprint\layout\preferred.py:179(inline_min_content_width)
  28101   0.293   0.661 ..\weasyprint\text.py:1176(can_break_text)
  28101   0.005   0.005 ..\weasyprint\formatting_structure\boxes.py:87(all_children)
  28101   0.044  46.791 ..\weasyprint\layout\inlines.py:29(iter_line_boxes)
  28101   0.601  46.747 ..\weasyprint\layout\inlines.py:62(get_next_linebox)
  28101   0.006   0.006 ..\weasyprint\layout\inlines.py:261(first_letter_to_box)
  28100   0.116   0.167 ..\weasyprint\layout\inlines.py:218(remove_last_whitespace)
  28100   0.005   0.005 ..\weasyprint\layout\inlines.py:1018(<listcomp>)
  28100   0.043   0.636 ..\weasyprint\layout\inlines.py:1003(line_box_verticality)
  28100   0.036   0.588 ..\weasyprint\layout\inlines.py:1059(aligned_subtree_verticality)
  28100   0.032   0.032 ..\weasyprint\layout\inlines.py:1149(text_align)
  21131   0.007   0.007 {method 'pop' of 'list' objects}
  20270   0.003   0.003 {method 'get' of 'dict' objects}
  19322   0.007   0.007 I:\x3rdParty\Python\lib\sre_parse.py:232(__next)

I dont know how to debug/step python. But what I read from the profile, is that iter_line_boxes is called a few times to often!

Let me know, if I can be of more assistance and thanks a lot in advance!

Johannes

liZe commented 6 years ago

This bug requires a specific layout situation with given widths and content. I hope it is reproducable on your side.

Unfortunately, it's not for me. Could you try to reproduce with a free font instead of Helvetica/Arial?

There's probably a problem with the columns, this feature is young and not widely used.

I dont know how to debug/step python. But what I read from the profile, is that iter_line_boxes is called a few times to often!

Using pdb may be useful, but it's hard to know where to put breakpoints when there's an endless loop. I'll try to explain how I would debug as soon as I can reproduce this error, I hope it'll help everyone to find a way to know what's going on…

liZe commented 6 years ago

Another possibility: this bug may be a duplicate of #614 as you're using Windows. If you use Pango < 1.40.13 then it's pretty sure.

JohannesMunk commented 6 years ago

Hey liZe! Thanks for looking into this. You are right, I'am running windows. But I just updated my GTK3 to the newest runtime dist and have now Pango 1.42.1.0. Sadly with the same endless loop. The same happens under MacOS X with all the latest dists through homebrew.. So this seems to be something new. I will try to create a situation with another font!

JohannesMunk commented 6 years ago

.. trying to figure out a font that would be available for you. Would "Verdana" work?

liZe commented 6 years ago

But I just updated my GTK3 to the newest runtime dist and have now Pango 1.42.1.0. Sadly with the same endless loop.

:cry:

trying to figure out a font that would be available for you. Would "Verdana" work?

Any font that's free and that I can easily download anywhere. Using Google Fonts' @import rules is also a solution.

Tontyna commented 6 years ago

Can reproduce the issue. Seems to be another windowish font problem: No endless loop when using font-family: DejaVu Sans, sans-serif;

Will try to debug and catch...

Tontyna commented 6 years ago

OMG! Its weird! The endless loop is triggered by the (seemingly ineffective completely useless HA!) doubled <span><span>. Reducing it to only one span the document renders fine.

Is this Cairo again? Akin to #628?

Tontyna commented 6 years ago

It's not the column-count and it's not the table. It's definitely the double-span, followed by the closing bracket (!) in combination with those windowish fonts and a special width of the containing box where the inline-splitting runs into an infinite loop -- constructed a simple div with the required width and the double-span and WHOOM! Only difference: table and column-count get stuck in the layout of page 1, the simple div doesnt stop to produce pages.

Probably another discrepancy in the calculation of text-widths between ??? and ??? -- dunno yet, yes, looks like #614 and #585 , but that bug has been fixed...

JohannesMunk commented 6 years ago

Hey Tontyna! Thanks for your digging! Cool, that you could reduce it further to the outside div.

Concerning the double spans: In the non reduced file the spans of course have different classes and attributes. Yes, I could programmatically combine them. But as the double spans work with other content in between.. it must be the combination of things, like you pointed out.

Tontyna commented 6 years ago

That's the output when I break the make_all_pages-loop:

endless

Interestingly the opening bracket isn't repeated.

JohannesMunk commented 6 years ago

Binary search of first problem occurrence in document after switching to DejaVu:

<html lang="de">
 <head>
  <meta charset="utf-8" />
  <title>All Truckgroups</title>
  <style>
        body {
            margin: 1px;
            font-size: 9pt !important;
            font-family: DejaVu Sans, sans-serif;
        }
        .options {
            column-count: 2;
            column-gap: 3.5em;
            margin-left: 1cm;
            margin-right: 2cm;
        }
        table {
            width: 100%;
            border-spacing: 0;
            border-collapse: collapse; 
            font-size: 0.9em !important;
        }
        table, td {
            padding: 0; 
            margin: 0;
        }
        @media print {
            @page {
                size: A4;
                margin: 0;
            }
        }
    </style>
 </head>
 <body><div id="full">
   <div class="options">
    <div><span><span>Wenn der Schlauch angebracht ist, ist kein Herausheben der Batterie am NT-Mast
       und für Hubhöhen ≤ 2.600 mm an TL/TF-Masten möglich</span></span><span><span>nicht verfügbar
       bei NT Mast</span></span></div>
   </div>
  </div></body>
</html>

Hope this is now reproducable everywhere?

I downloaded and installed 2.37 of the ttf fonts from https://dejavu-fonts.github.io/Download.html

Good night and good luck!

And thank you!

Tontyna commented 6 years ago

Eliminating the second double-span still results in an endless loop when the immediately following word is too long to fit into the same line:

<div class="options">
  <div><span><span>Wenn der Schlauch angebracht ist, ist kein Herausheben der Batterie am NT-Mast
       und für Hubhöhen ≤ 2.600 mm an TL/TF-Masten möglich</span></span>xxx
  </div>
</div>

Separating the "xxx" from the double-span -- e.g. with a LF or a SPACE - prevents the infinite loop:

<div class="options">
  <div><span><span>Wenn der Schlauch angebracht ist, ist kein Herausheben der Batterie am NT-Mast
       und für Hubhöhen ≤ 2.600 mm an TL/TF-Masten möglich</span></span>
xxx
  </div>
</div>

Definitely a mutation of #614

JohannesMunk commented 6 years ago

@Tontyna : cool! I really like your clear cut approach in #614.

Am I correct in saying, that by HTML standards this additional LF or SPACE should not matter?

I will try to introduce some in my output and see if my files work.

Thanks for your support!

Tontyna commented 6 years ago

@JohannesMunk : Thx.

But this issue must be caught one level higher. I almost understand where it happens but am not (yet) able to fix it.

Thats the situation:

 <LineBox div>
   <InlineBox span>
     <InlineBox span>
       <TextBox span> 
         Text that should be spread over 2 lines. Having a bit of 
         space at the end. But not enough for the following xxx
   <TextBox div>xxx 

Since there is no SPACE or LF after the <span> Weasyprint tries to put the "xxx" on the second line, detects that there isnt enough room. Now, I think, it SHOULD skip-stack-back to the start of the already correctly broken second line. But instead, it skip-stacks back to the start of the InlineBox.

Watching resume_at/skip_stack in get_next_linebox (calculated by split_inline_box) reveals it's pending endlessly:

(0, (0, (0, (30, None))))
(0, (0, (0, None)))
(0, (0, (0, (30, None))))
(0, (0, (0, None)))
(0, (0, (0, (30, None))))
(0, (0, (0, None)))
....

The above "30" is (as far as I understand) the first letter of the (successfully split) second text line, but next call to split_inline_box jumps back again (not shure, but I think it's the start of the <InlineBox span>).

liZe commented 6 years ago

Minimal case:

<div style="font-family: Ahem; width: 3.5em">
<span><span>xxx x x</span></span><span>x
liZe commented 6 years ago

Good news: I've been saved by a comment :wink:. Fix coming soon.

liZe commented 6 years ago

@JohannesMunk @Tontyna Thanks a lot for your bug report, examples and hard work.

The commit message and diff should be self-explanatory. This function is really tricky and quite recent (as it was modified to fix #163), that's why a lot of comments had been added. I would have spent days to fix this without this comment, that's another reason to add more (but not too many) as discussed in #659.

Tontyna commented 6 years ago

Oh yes, remember having seen those grandchildren before. I didnt like them at all :wink:

JohannesMunk commented 6 years ago

Hey you two!

I just successfully generated 6 x 125 pages of PDFs!! Great stuff! Thanks a lot for your super nice and quick responses and fixes. Everything working now! I am impressed by the thorough regression tests!

If you are interested I have another open issue concerning multiple columns and horizontal sizing of a table inside. I probably will be able to work around it, or shall I extract the problem and submit another issue for it?

All the best and thank you again!

Johannes

liZe commented 6 years ago

I just successfully generated 6 x 125 pages of PDFs!! Great stuff! Thanks a lot for your super nice and quick responses and fixes. Everything working now! I am impressed by the thorough regression tests!

:smiley:

If you are interested I have another open issue concerning multiple columns and horizontal sizing of a table inside. I probably will be able to work around it, or shall I extract the problem and submit another issue for it?

It looks like an awful problem, but I'd be happy to have a separate issue for that.

All the best and thank you again!

No problem! We currently have a tiny opinion survey open in #635, would you be interested in writing a little message? I'm curious about your 125-page documents :wink:.

Tontyna commented 6 years ago

Ah, the survey - isn't there a way to promote it? It's already buried in the open issues. Maybe via an issue template?