docbook / xslTNG

DocBook xslTNG Stylesheets
https://xsltng.docbook.org
MIT License
42 stars 22 forks source link

Fo transformations #121

Open rhwood opened 3 years ago

rhwood commented 3 years ago

Should the .fo transformations be ported from xslt 1.0 to this, or is this xslt set intentionally purely focused on HTML output?

ndw commented 3 years ago

That is a question that I wrestle with regularly. I'm inclined to say that the way forward is through an XML to HTML transformation (not the same transformation as for web pages, but still HTML with extra classes and such) and then print formatting via CSS. Given that development of FO has stopped, that seems like the right approach.

On the other hand, FOP is good enough for a lot of folks and there doesn't seem to be a similarly featured open source CSS print formatter. Which surprises me a little bit.

If lack of print output is holding you back from adopting these stylesheets, I guess I want to know that. Particularly if you're relying on FOP. Getting some kind of print output is on my todo list, but it's currently somewhere below "finish XML Calabash 3.x". ☹️

rhwood commented 3 years ago

Sorry for the delay in responding.

My requirement is a CLI asciidoc->PDF workflow; I am attempting to make it a workflow that requires minimal installation, consistency across platforms, and some complex formatting of the PDF output (i.e., number every fifth line and print some sections in a two column layout). I have examined the existing asciidoc-to-pdf tools and have determined that an asciidoc->docbook->fo->PDF process run using maven is my current best shot at minimal install (user needs to install a JDK and a recent maven and the workflow takes care of everything else) while ensuring consistency (maven downloads the same JAR versions for every user) and allowing complex outputs (using fop within maven).

tomschr commented 1 year ago

Just some comments/views from my side.

Me and my company are also interested in a FO transformation for several reasons:

Maybe some pro points why it would be beneficial to have a native XSLT 3.0 implementation of the FO stylesheets:

On the contrary, we should also consider the cons:

frank-steimke commented 1 year ago

I think that If we start to implement the FO stylesheets, they need to be maintained to full degree. Either you do it completely or not at all. Halfheartedly is not an option. This means: double the effort , unless you find a more intelligent solution. There are some options, they all need volunteers:

  1. development of XSLT 3 FO stylesheets, independent but carefully aligned to the xslTNG HTML+CSS Styleheets. IMHO not realistic.
  2. development of an open source paged Media processor. IMHO also not realistic.
  3. development of some sophisticated mechanism to generate FO and HTML+CSS Stylesheets from some higher-level intermediate language. Interesting, but also not realistic. You would need a language with enough expressive power to generate XSLT 3 Stylesheets, what language would that be?
  4. development of an open source engine that takes HTML+CSS as input and generates XSL-FO. Maybe this is what the guys from the Oxygen Company (SyncroSoft) are doing with the Oxygen Chemistry product? I tried to generate PDF from xslTNG output within the Oxygen Suite with PDF Chemistry, but without success. Maybe it was just a minor issue? I cant tell since its not open source.

I would think that the last option is the only realistic one, but @tomschr , this would not help the people that still have an old XSLT 1 codebase, would it? Also, you have to deal with CSS combined with XML / (X)HTML as the base for the translation to FO. You can find this technique under the Name CSSa (meaning CSS as Attributes in XML) as part of the transpect open source framework. So maybe there remains only one option like this:

  1. Development of an xslTNG post processing step which produces (X)HTML with CSSa, based on the existing CSS for xslTNG;
  2. Development of an translation engine which takes HTML+CSSa as input and generates XSL-FO for FOP.

Should be possible as an open source project based on transpect (maybe upgraded with a brand new Calabash processor for XProc 3?) I have absolutly no idea how much effort it would take.

Greetings, Frank

tomschr commented 1 year ago

Thanks @frank-steimke for your interesting perspective! You made some good points.

Well, developing an open source engine or developing XSL-FO stylesheets need both efforts and time. The question is which one would be more useful.

I would like to bring two other ideas which could mitigate the pain. Not sure if it's completely insane or it has some benefits.

  1. Why not use the existing XSL-FO 1.0 base stylesheets and transform it to get FO stylesheets that are more compatible with XSLT 3.0?

    Presumably it is not a complete replacement for manually written XSLT 3.0 stylesheets. Certainly there are some issues (extension functions, no test suite etc.). If all issues could be solved, we would at least have stylesheets that could be used by Saxon >10.

  2. In regards to an open source engine that takes HTML+CSS and generates PDF, there are some possible solutions.

    We don't need to develop a new engine, we can use already existing tools:

    • Using Google Chrome's Headless Mode You can run Google's Chrom from the commandline.

      $ chrome --headless --print-to-pdf="output.pdf" URL

    • Using wkhtmltopdf An open-source command-line tool which uses the WebKit rendering engine.

      $ wkhtmltopdf --page-size A4 URL output.pdf

    • Using chromehtml2pdf A JavaScript command-line tool that uses Chrome's headless mode.

      $ chromehtml2pdf --out=file.pdf --landscape=1 URL

    I haven't tested all of them (only wkhtmltopdf a bit). Maybe it helps.

frank-steimke commented 1 year ago

I think we are talking about different scenarios for using DocBook stylesheets. I am sure that in the future there will be more ways to transform HTML+CSS to PDF. But I don't think they will be suitable to produce large and long-lasting documents in high quality. To give a few examples:

The big advantage of DocBook 1.x stylesheets is that they are very mature and implement all the above requirements (and many more). And the biggest advantage: an acitve community, which feels responsible for the stylesheets, and patiently handles any kind of questions.

xslTNG together with paged media CSS will provide at least the same features. The open question is whether an open-source solution is conceivable that provides this performance even if FO is generated instead of (or in a second step from) HTML+CSS.

"Why not use the existing XSL-FO 1.0 base stylesheets and transform it to get FO stylesheets that are more compatible with XSLT 3.0?"

Well, you can use the XSLT 1.0 Stylesheets with the latest saxon Version. I did, and the only change that was absolute necessesary was a patch regarding an ancient node-set()- function. So it is of course possible to take that as the baseline for further development in the XSLT 3 direction.

But in doing so you would open a new line of XSL Stylesheets for docbook,, which would be in competition to the xslTNG line. This is not what i would like to support. My goal would be to support xslTNG as much as i can, and maybe contribute with documents (migration guide, best practice) or maybe an add-on for a translation of (HTML+CSS) to XSL-FO.

But before doing so, one shoud be convinced that there is a real need. Maybe evereyone who really needs high-quality Output with the features namend above already has a licence of a commercial prodict like Prince oder Antenna House and is totally fine with HTML paged media.

Cheers, Frank

tomschr commented 1 year ago

I think we are talking about different scenarios for using DocBook stylesheets.

Perhaps. :wink:

I am sure that in the future there will be more ways to transform HTML+CSS to PDF.

I hope so, really.

But I don't think they will be suitable to produce large and long-lasting documents in high quality.

I'm aware that the ideas that I've suggested is probably not a solution for high-quality docs. But for some it would be enough to get at least a "decent" PDF. What's really possible needs to be tested.

xslTNG together with paged media CSS will provide at least the same features. The open question is whether an open-source solution is conceivable that provides this performance even if FO is generated instead of (or in a second step from) HTML+CSS.

And this is the crucial point. It's all nice and dandy, but with the lack of an open-source solution I fear this is difficult. I don't know about an open source implementation.

Who will use it when you have to pay for a license? Wouldn't that divide the community?

But in doing so you would open a new line of XSL Stylesheets for docbook, which would be in competition to the xslTNG line.

Is it? And so does DocBook and the stylesheets compete with other documentation formats (ASCIIDoc, Sphinx, Markdown to name the most well-known). I don't see it as something bad. :slightly_smiling_face:

I see it more as offering an alternative. If HTML+CSS paged media cannot or don't want to be used, xou can also view it as an intermediate step to see if XSL-FO is really needed these days. If you can get some feedback or statistics then there is probably some need for it. If not, not much harm development time was wasted.

Maybe evereyone who really needs high-quality Output with the features namend above already has a licence of a commercial prodict like Prince oder Antenna House and is totally fine with HTML paged media.

This is a far-fetched assumption. :wink: Many open source projects would like to get also high-quality output. In former times it was possible with XSL-FO and it did the job well. FOP is now quite stable.

The formatting landscape today is different when XSL-FO was created and when the 1.0 stylesheets were implemented. Now we have not only DocBook, but different formats all competing with each other.

If projects cannot jump on the HTML+CSS page-media wagon, they will either use the limited option (as I showed) or they switch to something else.

frank-steimke commented 1 year ago

i do not intend to promote commercial products, quite the contrary. And yet the question remains, how much interest is there in a solution for high-quality FO for docbook?

If most users are satisfied with free, but not so high quality solutions, there are already products. You have pointed that out.

If the others who need high quality already have or are willing to buy commercial products for HTML CSS, they have no need for an additional FO solution.

My only point is that a vibrant community needs to come about,. Right now there are exactly two people participating in the discussion about FO for xslTNG. Add the original poster and there are three of us. That's too few in any case.

Maybe the antenna house company knows more. Their product support FO as well as HTML+CSS. They should know their customers needs.

tomschr commented 1 year ago

And yet the question remains, how much interest is there in a solution for high-quality FO for docbook?

If most users are satisfied with free, but not so high quality solutions, there are already products. You have pointed that out.

True.

Right now there are exactly two people participating in the discussion about FO for xslTNG. Add the original poster and there are three of us. That's too few in any case.

Well, to some degree that's true, but I'm not completely sure if a hidden issue in a GitHub repo that not many know or aware of it is a good measurement. Perhaps not many have it on their radar or think it's production ready.

It would probably be more efficient and reach many more users if we asked on the docbook-apps mailing list and see what the response is.

fsteimke commented 6 months ago

I found print-css.rocks. Various tools are presented and tested there, commercial as well as open source. This led me to weasyprint. Results are promising, see attached file

xnachweis-w.pdf

There is an issue with references to page numbers which results in 0 (zero) in the table of content, see issue #497. But i think it's an issue of CSS from xslTNG, not the rendering engine.

Maybe weasyprint is the open source rendering engine we were looking for?