Kozea / WeasyPrint

The awesome document factory
https://weasyprint.org
BSD 3-Clause "New" or "Revised" License
6.86k stars 659 forks source link

WeasyPrint 0.42 gets stuck #560

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

We are calling WeasyPrint via pandoc. With 0.42 and a certain input, the command never completes, but does not throw an error, causing builds to timeout.

Downgrading from weasyprint 0.42 to 0.41 solves the issue: https://github.com/greenelab/scihub-manuscript/commit/5cb1245ca3e68c425b1364317e2496c17e20353c produced a passing build.

The issue doesn't happen for all inputs (pandoc manuscripts). See for example, this passing build with WeasyPrint 0.42.

The command that fails is:

pandoc \
  --from=markdown \
  --to=html5 \
  --pdf-engine=weasyprint \
  --pdf-engine-opt=--presentational-hints \
  --filter=pandoc-fignos \
  --filter=pandoc-eqnos \
  --filter=pandoc-tablenos \
  --bibliography=$BIBLIOGRAPHY_PATH \
  --csl=$CSL_PATH \
  --metadata link-citations=true \
  --webtex=https://latex.codecogs.com/svg.latex? \
  --css=webpage/github-pandoc.css \
  --output=output/manuscript.pdf \
  $INPUT_PATH

Any ideas on what the problem could be or how to better diagnose the issue?

liZe commented 6 years ago

It may be a duplicate of #557. Could you please try to reproduce with master?

dhimmel commented 6 years ago

Will comment again when the build for https://github.com/greenelab/scihub-manuscript/commit/cfe2a05dfa9355d601a438b450b3c5980e2299a1 is complete. So far not looking good (running for over 80 minutes), although its not clear yet exactly where its getting stuck. Interesting that it's not timing out after inactivity like before (that could be a Travis issue, perhaps the root cause of this backlog).

Update: build has been "Running for 5 hrs 46 min 25 sec". I think we broke Travis hehe.

dhimmel commented 6 years ago

The build finally timed out after running for 8 hours. It appears the build got stuck while running WeasyPrint. So I don't think the issue has been fixed as of Kozea/WeasyPrint@ea9ffc9a3fcd6b45cc0172af4f9913abe1cb49e5.

liZe commented 6 years ago

Is it possible to get a HTML+CSS file generated by pandoc that makes the bug happen?

dhimmel commented 6 years ago

We also have pandoc export an HTML page, which has a few additional javascript elements. This HTML page also experiences the issue, so we can switch to it for debugging (removing the need to deal with pandoc).

Currently I get the hangup locally when running the following with Python 3.6.4 and WeasyPrint 0.42

weasyprint https://greenelab.github.io/scihub-manuscript/ weasyprint.pdf

Note that the content at https://greenelab.github.io/scihub-manuscript/ will change, so that URL may no longer trigger the issue in the future. If so, the versioned source for that webpage is preserved here.

liZe commented 6 years ago

I think we broke Travis hehe.

:smile:

Currently I get the hangup locally when running the following with Python 3.6.4 and WeasyPrint 0.42 weasyprint https://greenelab.github.io/scihub-manuscript/ weasyprint.pdf

I can't reproduce, even with the preserved version, probably because we don't have the same default font. Could you please tell me what launching fc-match sans-serif in a terminal gives on your system?

dhimmel commented 6 years ago

I'm on Ubuntu 17.10:

$ fc-match sans-serif
DejaVuSans.ttf: "DejaVu Sans" "Book"
liZe commented 6 years ago

$ fc-match sans-serif DejaVuSans.ttf: "DejaVu Sans" "Book"

I've got the same default font, I don't know why it works for me. I've tried with both 0.42 and master, the PDF is correctly generated. I don't know how I could find what's going on…

dhimmel commented 6 years ago

I created a docker container, hoping that it would exhibit the error:

docker run \
  --name=weasyprint-560 \
  --interactive --tty \
  --entrypoint=bash \
  python:3.6

Then inside the container's bash shell, I ran:

pip install weasyprint==0.42
cd home
git clone --single-branch --branch gh-pages https://github.com/greenelab/scihub-manuscript.git
cd scihub-manuscript
git checkout 0f1a35706985507ab12ad7a8c3c97d99d6e4aaa0
weasyprint index.html weasyprint.pdf

This weasyprint command completed... i.e. no bug. This image is based on Debian 8. After lunch I will see if the Ubuntu image gets the error.

dhimmel commented 6 years ago

I switched to using conda to manage the environment in the Docker, so we can potentially better replicate our error:

docker run \
  --name=weasyprint-560 \
  --interactive --tty \
  --entrypoint=bash \
  continuumio/miniconda3:4.3.27

Then I ran:

apt-get install --yes gcc
cd /home
wget https://github.com/greenelab/manubot-rootstock/raw/59af0a2bdc23bbf48fae0acdcb8183888f12880e/build/environment.yml
conda env create --file=environment.yml
source activate manubot
git clone --single-branch --branch gh-pages https://github.com/greenelab/scihub-manuscript.git
cd scihub-manuscript
git checkout 0f1a35706985507ab12ad7a8c3c97d99d6e4aaa0
weasyprint index.html weasyprint.pdf

Unfortnately, weasyprint gives the following error:

OSError: dlopen() failed to load a library: cairo / cairo-2

Which seems to indicate that cairo is not installed (as per https://github.com/Kozea/CairoSVG/issues/84), although it seems that conda should be installing it. Anyways, I'll update if I make anymore progress.

mengyyy commented 6 years ago

After i update weasyprint from 0.41 to 0.42 ,I found it could not generate pdf or png sfter it use 90% cpu for 30 minutes. Python 3.5.2 Ubuntu 16.04

liZe commented 6 years ago

After i update weasyprint from 0.41 to 0.42 ,I found it could not generate pdf or png sfter it use 90% cpu for 30 minutes.

@mengyyy Did you try the current master branch? You may have hit #557.

charno6 commented 6 years ago

Hi,

I am running into some issues with WeasyPrint getting stuck when creating PDFs from certain HTML files. I have now managed to (kind of) narrow down when this error occurs. My coding skills are insufficient to go into the WeasyPrint code to find out why this is happening; I hope this is helpful anyway. The files are attached.

wp-table-demonstration.zip

Basically, it's quite strange: On my system, the file "triangulate-error.html" will cause WeasyPrint to go into a kind of infinite loop when printing a page that contains a table; while "triangulate-noerror.html" will create a PDF within a matter of seconds. The only difference between them is in one additional "strong" tag in one of the cells. No error is logged by WeasyPrint. Interestingly, one other way to make the issue disappear is to change the font in the CSS to "Times New Roman". (I have also tried Georgia, Palatino, and my preferred font Iowan Old Style.)

My system is macOS Sierra 10.12.6 (16G1212), Python 3.6.4, WeasyPrint version 0.42.1. I hope someone is able to reproduce this problem. If there is any way I can help, let me know.

EDIT: I have looked in-depth at another article that was causing the problem. I now have the suspicion that it is probably something to do with using tags that affect formatting such as "strong" and "em" within brackets, both round and square.

liZe commented 6 years ago

I hope someone is able to reproduce this problem.

I can reproduce, thanks a lot for the example.

dhimmel commented 6 years ago

I updated the test PR (https://github.com/greenelab/scihub-manuscript/pull/39) to use the latest WeasyPrint commit. This changed the build from timing out to failing. See the new error here:

  File "/home/travis/miniconda/envs/manubot/lib/python3.6/site-packages/weasyprint/layout/inlines.py", line 201, in skip_first_whitespace
    result = skip_first_whitespace(box.children[index], next_skip_stack)
IndexError: list index out of range
Error producing PDF.

Haven't looked at this in detail but wanted to give a heads up.

liZe commented 6 years ago

This changed the build from timing out to failing.

I'm really sorry :disappointed:. Your original URL raises the error, I'll fix that and add another non-regression test.

liZe commented 6 years ago

I've checked the tricky part of the new breaking line algorithm that causes these bugs (see #301 and #528). I've added some comments to help us in the future and corrected a couple of problems. I've also added a test to make sure this case won't happen again.

I really appreciate the time you take to report the issues and provide examples. If you find other crashes, please report them as well, I'll do my best to fix them as soon as possible!

dhimmel commented 6 years ago

I really appreciate the time you take to report the issues and provide examples.

No worries! thanks for the fixes. I updated https://github.com/greenelab/scihub-manuscript/pull/39 to use WeasyPrint https://github.com/Kozea/WeasyPrint/commit/79e2b426a4a3a701d054a7973db19ce1ac956be7 and the build succeeded. So I think we're finally good!

charno6 commented 6 years ago

I concur, thank you for the (rapid) fixes! I've just run WeasyPrint 0.42.2 against my list of articles that were previously causing problems and they all passed without a hitch. Thank you so much!

samdmarshall commented 6 years ago

Hi, I am running into the problem described above when using weasyprint 0.42.2. This is the html page that is causing the problem for me (https://pewpewthespells.com/blog/sparse_sdks.html). I am also generating the page via pandoc with the following command:

pandoc 
  --from markdown+grid_tables 
  --to html5 
  --include-in-header "header.html" 
  --highlight-style pygments 
  --email-obfuscation references 
  sparse_sdks.md 
  --output "sparse_sdks.html"

When weasyprint is executed and proceeds to get stuck being hung and is killed via local interrupt this is the traceback I get:

Traceback (most recent call last):
  File "/usr/local/bin/weasyprint", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/__main__.py", line 177, in main
    getattr(html, 'write_' + format_)(output, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/__init__.py", line 182, in write_pdf
    font_config=font_config).write_pdf(
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/__init__.py", line 143, in render
    font_config)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/document.py", line 326, in _render
    [Page(p, enable_hinting) for p in page_boxes],
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/document.py", line 326, in <listcomp>
    [Page(p, enable_hinting) for p in page_boxes],
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/__init__.py", line 55, in layout_document
    context, root_box, html, cascaded_styles, computed_styles))
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/pages.py", line 601, in make_all_pages
    context, root_box, page_type, resume_at, page_number)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/pages.py", line 520, in make_page
    positioned_boxes, positioned_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 83, in block_level_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 638, in block_container_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 83, in block_level_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 638, in block_container_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 83, in block_level_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 510, in block_container_layout
    for line, resume_at in lines_iterator:
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 51, in iter_line_boxes
    device_size, absolute_boxes, fixed_boxes, first_letter_style)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 108, in get_next_linebox
    waiting_floats, line_children=[])
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 727, in split_inline_box
    line_placeholders, waiting_floats, line_children)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 600, in split_inline_level
    waiting_floats, line_children)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 727, in split_inline_box
    line_placeholders, waiting_floats, line_children)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 577, in split_inline_level
    context, box, max_x - position_x, skip)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 915, in split_text_box
    text, box.style, context, available_width, box.justification_spacing)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/text.py", line 959, in split_first_line
    text, style, context, max_width, justification_spacing)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/text.py", line 849, in create_layout
    layout = Layout(context, style.font_size, style)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/text.py", line 637, in __init__
    'cairo_t *', cairo_dummy_context._pointer)),
KeyboardInterrupt
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 47, in apport_excepthook
    try:
KeyboardInterrupt

Original exception was:
Traceback (most recent call last):
  File "/usr/local/bin/weasyprint", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/__main__.py", line 177, in main
    getattr(html, 'write_' + format_)(output, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/__init__.py", line 182, in write_pdf
    font_config=font_config).write_pdf(
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/__init__.py", line 143, in render
    font_config)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/document.py", line 326, in _render
    [Page(p, enable_hinting) for p in page_boxes],
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/document.py", line 326, in <listcomp>
    [Page(p, enable_hinting) for p in page_boxes],
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/__init__.py", line 55, in layout_document
    context, root_box, html, cascaded_styles, computed_styles))
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/pages.py", line 601, in make_all_pages
    context, root_box, page_type, resume_at, page_number)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/pages.py", line 520, in make_page
    positioned_boxes, positioned_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 83, in block_level_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 638, in block_container_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 83, in block_level_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 638, in block_container_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 83, in block_level_layout
    adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 111, in block_box_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/blocks.py", line 510, in block_container_layout
    for line, resume_at in lines_iterator:
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 51, in iter_line_boxes
    device_size, absolute_boxes, fixed_boxes, first_letter_style)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 108, in get_next_linebox
    waiting_floats, line_children=[])
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 727, in split_inline_box
    line_placeholders, waiting_floats, line_children)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 600, in split_inline_level
    waiting_floats, line_children)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 727, in split_inline_box
    line_placeholders, waiting_floats, line_children)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 577, in split_inline_level
    context, box, max_x - position_x, skip)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/layout/inlines.py", line 915, in split_text_box
    text, box.style, context, available_width, box.justification_spacing)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/text.py", line 959, in split_first_line
    text, style, context, max_width, justification_spacing)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/text.py", line 849, in create_layout
    layout = Layout(context, style.font_size, style)
  File "/usr/local/lib/python3.5/dist-packages/weasyprint/text.py", line 637, in __init__
    'cairo_t *', cairo_dummy_context._pointer)),
KeyboardInterrupt

If you need any other information, please let me know; I would like to get this resolved as quickly as possible.

liZe commented 6 years ago

Hi, I am running into the problem described above when using weasyprint 0.42.2.

Thank you for this report.

This issue has been closed with a commit fixing the original bug, so your problem is different even if it leads to the same consequences. Could you please open a separate issue?

I would like to get this resolved as quickly as possible.

So do I!