ftlabs / ftcolumnflow

A polyfill that fixes the inadequacies of CSS column layouts
MIT License
633 stars 51 forks source link

FTColumnFlow to PDF #29

Closed topherreynoso closed 9 years ago

topherreynoso commented 10 years ago

I have been exploring ways to generate PDFs from HTML code in a Rails application. To make documents look a lot sharper I decided to utilize FTColumnFlow. I am using Wicked PDF to generate the documents from a template rendered using FTColumnFlow, with a javascript delay to make sure FTColumnFlow has time to do its stuff. Here is the code:

av = ActionView::Base.new()
av.view_paths = ActionController::Base.view_paths
av.class_eval do
    include Rails.application.routes.url_helpers
    include ApplicationHelper
end
body_html = av.render :pdf => "#{@iter.name}", :template => "iterations/show.pdf.erb", :locals => {:iteration_id => @iter.id}, :javascript_delay => 3000
pdf = WickedPdf.new.pdf_from_string(body_html, :page_size => 'Letter', :margin => {:top => 10, :bottom => 10, :left => 15, :right => 15}, :javascript_delay => 3000)

That's the rails and Wicked PDF side of things, here is the iterations/show.pdf.erb referenced where the content for the documents is added and FTColumnFlow is implemented.

<html>
  <head>
    <meta charset="utf-8" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Language" content="en-us" />
    <script src="//ajax.googleapis.com/ajax/libs/webfont/1.4.7/webfont.js"></script>
    <script>
      WebFont.load({
        google: {
          families: ['Droid Sans:n4,n7']
        }
      });
    </script>
    <%= wicked_pdf_javascript_include_tag "FTColumnflow" %>
    <%= wicked_pdf_stylesheet_link_tag 'pdf/iterations' %>
    <style>
      body { margin: 0; padding: 0; }
      #flowedContent, #fixedContent { opacity: 0; }
    </style>
  </head>
  <body>
    <section id="viewport">
      <article id="target"></article>
    </section>
    <div id="flowedContent">
      <%= @iteration.verbiage.html_safe %>
    </div>

    <% if !@iteration.fixed_verbiage.nil? && @iteration.fixed_verbiage != "" %>
      <div id="fixedContent">
        <%= @iteration.fixed_verbiage.html_safe %>
      </div>
    <% end %>
    <script type="text/javascript">
      var targetEl  = document.getElementById('target');
      var viewportEl = document.getElementById('viewport');
      var cf = new FTColumnflow(targetEl, viewportEl, {
        columnCount:             2,
        pageArrangement:         'vertical',
        standardiseLineHeight:   true,
        columnFragmentMinHeight: 30,
        pagePadding:             1,
        columnGap:               20,
        viewportWidth:           1095,
        viewportHeight:          1532,
        allowReflow:             false
      });
      if(<%= @iteration.fixed_verbiage.nil? %> == true || <%= @iteration.fixed_verbiage == "" %> == true){
        var flowedContent = document.getElementById('flowedContent');
        cf.flow(flowedContent, null);
      }else{
        var flowedContent = document.getElementById('flowedContent'),
            fixedContent = document.getElementById('fixedContent');
        cf.flow(flowedContent, fixedContent);
      }
    </script>
  </body>
</html>

Basically, I just make the viewportWidth and viewportHeight match the size of a letter size page which makes Wicked PDF render each FTColumnFlow page on a new page in the PDF document. You can see also that I make room for rendering it a little differently based on whether or not the document requires fixedContent. This has been almost successful but I am running into some issues on lines being repeated or dropped when breaking from one column or page to the next. Some pictures are below, the first shows a line repeating on a column break and the second shows a line deleted on a page break. I cannot find the issue, sometimes the column and page breaks work out fine but a significant percentage of the time it results in a duplicated or lost line. column repeat page delete

georgecrawford commented 10 years ago

Hi @topherreynoso. Thanks for the comment; what an interesting use-case! It's certainly one I hadn't considered.

Your problems with missing/duplicated lines of text are certainly something I've seen before. This is nearly always a symptom of ColumnFlow measuring the text, laying out the columns, and then the rendered font changing in some way after the columns have been laid-out. I see you've found issue #7 which discusses the same thing. Other things to look for would be any resource which could change its dimensions asynchronously after ColumnFlow has run - for example, an image which hasn't yet loaded when the measuring is done, but then loads into place, causing a shift in rendered content below.

In my uses of ColumnFlow, I'm very careful to ensure that the font is fully loaded before laying out the columns. But in your comment you suggest that you're not using custom fonts, so it's probably not that.

@jessedijkstra may still have a point with the issue about selectors, however. Initially, ColumnFlow lays out the text in a single, long column, so that it can measure each text block. If there's a CSS rule which matches when the text is in one column (something like p + p { text-indent: 30px }; for example), there's a chance it might not match when ColumnFlow has chopped the blocks into column chunks - the second paragraph, which is to be indented, may now actually be the first and only paragraph in column two, so will no longer match the rule.

One bit of advice for you would be to reveal the hidden column overflows. If you add .cf-column { overflow: visible !important; } in devtools, you'll see the overflowing text at the top and bottom of each column. Many paragraphs will appear twice, once at the bottom of a column and once at the top of the next. You may find that there's an issue caused by something like text-indent which is applied in two different ways for the two instances, causing a line to be inserted/removed.

I'm not sure you mentioned it, but do you see the same issues whether or not you render as a PDF? As in, if you render the same content in a web page, do you see the same issues? Another thing to try would be to increase the JavaScript delay - if you bump it up a few seconds, is the layout more likely to be correct? If so, you're certainly facing a problem with lazy-loading of a resource like a font or image.

topherreynoso commented 10 years ago

Thank you so much for taking the time to evaluate my use-case and for the thorough suggestions. I will sit down tonight and see if I can’t get this figured out using your suggestions and I’ll let you know what was helpful in resolving the issue.

Thank you again for your time.

On Nov 3, 2014, at 2:42 AM, George Crawford notifications@github.com wrote:

Hi @topherreynoso https://github.com/topherreynoso. Thanks for the comment; what an interesting use-case! It's certainly one I hadn't considered.

Your problems with missing/duplicated lines of text are certainly something I've seen before. This is nearly always a symptom of ColumnFlow measuring the text, laying out the columns, and then the rendered font changing in some way after the columns have been laid-out. I see you've found issue #7 https://github.com/ftlabs/ftcolumnflow/issues/7 which discusses the same thing. Other things to look for would be any resource which could change its dimensions asynchronously after ColumnFlow has run - for example, an image which hasn't yet loaded when the measuring is done, but then loads into place, causing a shift in rendered content below.

In my uses of ColumnFlow, I'm very careful to ensure that the font is fully loaded before laying out the columns. But in your comment you suggest that you're not using custom fonts, so it's probably not that.

@jessedijkstra https://github.com/jessedijkstra may still have a point with the issue about selectors, however. Initially, ColumnFlow lays out the text in a single, long column, so that it can measure each text block. If there's a CSS rule which matches when the text is in one column (something like p + p { text-indent: 30px }; for example), there's a chance it might not match when ColumnFlow has chopped the blocks into column chunks - the second paragraph, which is to be indented, may now actually be the first and only paragraph in column two, so will no longer match the rule.

One bit of advice for you would be to reveal the hidden column overflows. If you add .cf-column { overflow: visible !important; } in devtools, you'll see the overflowing text at the top and bottom of each column. Many paragraphs will appear twice, once at the bottom of a column and once at the top of the next. You may find that there's an issue caused by something like text-indent which is applied in two different ways for the two instances, causing a line to be inserted/removed.

I'm not sure you mentioned it, but do you see the same issues whether or not you render as a PDF? As in, if you render the same content in a web page, do you see the same issues? Another thing to try would be to increase the JavaScript delay - if you bump it up a few seconds, is the layout more likely to be correct? If so, you're certainly facing a problem with lazy-loading of a resource like a font or image.

— Reply to this email directly or view it on GitHub https://github.com/ftlabs/ftcolumnflow/issues/29#issuecomment-61455802.

jessedijkstra commented 10 years ago

@topherreynoso What we did with our own implementation (a basic version of FTColumnFlow, called Kolom) is to copy the computed offsets of a style.

So if you do p + p, we get the computed style of the original container through window.getComputedStyle() and set the margin-top etc on first the attached element of a column.

Luckily, the first attached elements are currently the only ones that suffer from the problem, but we might have to do it for all elements that are Columnized if we use more complex selectors in our CSS.

jessedijkstra commented 10 years ago

@topherreynoso Like I've said in a different thread: try using a FontLoader such as the Google/Typekit FontLoader or https://github.com/smnh/FontLoader to load the fonts before calling the columnize function to ensure the fonts are loaded before the calculation is done.

@rickpastoor came with the second solution which we modified to support weight and style. We use this version because it supports disabling timeouts etc and is more lightweight. The one by @smnh doesn't use setTimeout, but instead utilizes the scroll event.

topherreynoso commented 10 years ago

Sorry it took me a little while to test this out. Here is where I'm at: Whether I run it with or without custom fonts (using the google fontloader when I do use custom fonts) seems to make no difference. The documents have no images, it is only text with css. There are many sorted and unsorted lists throughout the documents and, although many times the column or page breaks properly on these items, I have definitely noticed that there are only occurrences of this on columns that are starting or ending on lists. Revealing the overflow text is a mess because I'm in a vertical layout and it makes text really hard to decipher most of the time but what I could see was not very revealing. Javascript delays do not seem to make a difference at all, which makes me believe that it has more to do with the lists mentioned above than any lazy loading. Finally, the issue persists whether it is rendering as a PDF or as a web page.

Here is the CSS for my lists, please let me know if you see anything in particular that may cause issues for FTColumnFlow, I will be going over this and utilizing the suggestions from @jessedijkstra to see if I can deal with these. Any help is greatly appreciated.

ol, ul {
  font-size: 12pt;
  line-height: 15pt;
  hyphens: auto;
  -moz-hyphens: auto;
  -ms-hyphens: auto;
  -webkit-hyphens: auto;
  hyphenate-after: 2;
  -ms-hyphenate-after: 2;
  -moz-hyphenate-after: 2;
  -webkit-hyphenate-after: 2;
  hyphenate-before: 2;
  -ms-hyphenate-before: 2;
  -moz-hyphenate-before: 2;
  -webkit-hyphenate-before: 2;
  text-indent: 0;
  text-align: justify;
  padding: 0 1pt 0 15pt;
  margin: 0 0 0 4pt;
}

li {
  font-size: 12pt;
  line-height: 15pt;
  hyphens: auto;
  -moz-hyphens: auto;
  -ms-hyphens: auto;
  -webkit-hyphens: auto;
  hyphenate-after: 2;
  -ms-hyphenate-after: 2;
  -moz-hyphenate-after: 2;
  -webkit-hyphenate-after: 2;
  hyphenate-before: 2;
  -ms-hyphenate-before: 2;
  -moz-hyphenate-before: 2;
  -webkit-hyphenate-before: 2;
  text-indent: 0;
  text-align: justify;
  padding: 0 1pt 0 1pt;
  margin: 0 0 0 0;
}
georgecrawford commented 10 years ago

Hey Topher.

So, if you can rule out fonts, images and the PDF conversion, it's likely to be a bug in ColumnFlow which is triggered by some particular combination of CSS, as you suspect.

What we need is a reduced test case. Can you find an example document which breaks every time, and reduce the complexity (both of the markup and the CSS) as much as possible, until we have a case which we can examine? And can you then drop that into http://jsbin.com/ or similar, so we can see what's happening?

topherreynoso commented 10 years ago

My last comment was inaccurate (mostly due to insufficient testing on my part, my apologies).

I decided to try removing all css completely and was performing this test with just html and found that the problem persisted. So I went back and did further testing using the show_as_html option in wicked_pdf in order to debug it as a web page and found that in this way, FTColumnFlow did not repeat or delete any lines.

As soon as I leave debug mode and go back to PDF, not only are pages breaking at different parts of the text than what was displayed in the web page but the repeated and deleted lines returned (all of this performed the same with and without my css, with and without custom fonts).

So now I believe that there is something going on between when wicked_pdf renders this to PDF and when FTColumnFlow is laying out these lines. However, I have played with javascript delays (ranging from 1000 to 30000) both in wicked_pdf as well as adding a delay to the FTColumnFlow like so:

setTimeout(flowPage, 1000);
function flowPage(){
  cf.flow(document.getElementById('flowedContent'), null);
}

None of this has resulted in any change in the PDF or web page in debug mode.

georgecrawford commented 10 years ago

OK, you've narrowed it down to the ColumnFlow/PDF combination, but as I can't see the output I can't really help. Are you able to link to a reduced test-case as I described above?

topherreynoso commented 10 years ago

I'm not sure how the jsbin would work since this is incorporating wicked_pdf, which is a ruby gem. Would it be helpful to have a reduced case rails app that demonstrates the issue? Perhaps something on github you could checkout? I can also host it on heroku to let you see it in action. Thanks again for the help.

topherreynoso commented 10 years ago

There's also a comment from @unixmonkey here that might be worth looking into. I posited the question of how to get it working in the wicked_pdf issues as well just in case someone over there might have an insight on what's happening between FTColumnFlow doing its thing and wicked_pdf doing its side. I think his suggestion doesn't solve my particular issue and I don't believe FTColumnFlow supports a way to add a class to each row anyhow.

georgecrawford commented 9 years ago

I'm afraid I know nothing of wicked_pdf, and don't really have the time to look into this for you. Sorry :frowning: