alan-if / alan-docs

Alan IF Documentation Project
https://git.io/alan-docs
Other
4 stars 0 forks source link

Shell Scripts Produce Corrupt PDFs under Bash for Windows #66

Open tajmone opened 5 years ago

tajmone commented 5 years ago

The .sh script to build the PDF docs via asciidoctor-fopub don't work as expected under Bash for Windows, for they seem unable to locate the default template images, thus creating a PDF with no admonitions icons:

SEVERE: Image not found. URI: D:/local/path/to/asciidoctor-fopub/build/fopub/docbook/images/note.svg. (No context info available)

whereas the image mentioned in the error is actually there, with the correct path and filename!

It must be something related to how paths are handled in Bash for Windows vs real *nix bash, and how these interact with FOP.

We need to:

tajmone commented 5 years ago

This is a rather annoying problem because if asciidoctor-fopub could run without problems in Bash for Windows we could get rid of the batch scripts and use only Bash scripts — the assumption being that all contributors to the project should have Git, and therefore a Bash also on Windows.

Having to maintain all scripts in two versions (Bash + batch) is not only a burden, but it could easily lead to having scripts out of synch, therefore I'm not happy about it. Also, Bash offers many useful tools which are not available on the CMD, many of which are useful for working on cross-platform repos (e.g. unix2dos, dos2unix, iconv, etc.).

The current solution of Bash scripts aborting when detecting that they are running under MinGW* is far from ideal — just a safeguard.

@thoni56, any idea why asciidoctor-fopub fails under Bash for Win?

thoni56 commented 4 years ago

I've done some experimentation with this in Cygwin and Msys2 and get the same problem in both. Once I've run pdf_build.sh once and have a manual.xml I can explore fopub problems. I'm dumping some experiments and observations here.

Running fopub from asciidoctor-fopub directory

When I run the following command from the directory of asciidoctor-fopub

$ ./fopub ../Alan/alan-docs/manual/manual.xml

I get some missing images, but there is also

 Cannot read configuration file:///home/Thomas/Utveckling/asciidoctor-fopub/build/fopub/docbook-xsl/xslthl-config.xml: \home\Thomas\Utveckling\asciidoctor-fopub\build\fopub\docbook-xsl\xslthl-config.xml (Path cannot be found)
java.io.FileNotFoundException: \home\Thomas\Utveckling\asciidoctor-fopub\build\fopub\docbook-xsl\xslthl-config.xml (Path cannot be found)
...

My guess is that that means that fopub uses the Windows Java (whatever you have) with Cygwin/Msys2 (and I presume Git Bash for Windows) paths. (\home\Thomas\Utveckling\asciidoctor-fopub\build\fopub\docbook-xsl\xslthl-config.xml has been re-formatted as a Windows path but not remapped.)

But I can't understand who/where that reference to the config file is. If I knew that I could at least investigate what happens if that config file was read.

You have probably already seen that there is actually a manual.pdf generated, but without images.

Running fopub from directory of manual

Running

../../../asciidoctor-fopub/fopub manual.xml

I get the same Java exception, but the images are included (which is no big surprise).

Running pdf_build.sh on Cygwin

When running

./pdf_build.sh

from the manual directory on Cygwin creates a completely different error:

USAGE
fop [options] [-fo|-xml] infile [-xsl file] [-awt|-pdf|-mif|-rtf|-tiff|-png|-pcl|-ps|-txt|-at [mime]|-print] <outfile>

[OPTIONS]
...
java.io.FileNotFoundException: Error: xml file C:\home\Thomas\Utveckling\Alan\alan-docs\manual\manual.xml not found

This led me to try to change the invocation of fopub in the script to not move into the assets directory but instead reference the xsl config with a path, thus

 fopub -t ../_assets/alan-xsl-fopub/xsl-fopub manual.xml

which kind-of improved things. Instead of the "USAGE" message I got another "File not found" exception:

org.apache.fop.apps.FOPException: javax.xml.transform.TransformerException: java.io.FileNotFoundException: C:\Users\Thomas\Utveckling\Alan\alan-docs\manual\db5.ent

for which the path looks ok (Windowsy enough) but the db5.ent file is nowhere to be found. Ideas where to look?

(For Msys2 I get the same behaviour, provided I comment out the Bash for Windows check. I have not tried with actual Git Bash for Windows)

Summary of findings

  1. fopub does seem to work under Cygwin/Msys2
  2. There is a path problem when the fopub\docbook-xsl\xslthl-config.xml is to be read (by some Java code/program)
  3. The pdf_build.sh seems to have a problem setting up for execution of fopub in Cygwin/Msys2 environments
tajmone commented 4 years ago

Mhhhh. I suspected this was the problem. I've experienced something similar in the StdLib repo, with other dependencies. The best approach IMO is to:

  1. Extract all assets' absolute paths and store into Shell env-variables.
  2. Convert them to correct Bash or Windows paths as required.
  3. Invoke fopub with all parameters as absolute paths.

But, from what I remember, the main problem are the DocBook settings files, which don't work well with relative paths across multiple OSs. I remember having experienced some problems locating the fonts, and that it could only be done using Windows paths for some reasons.

So, we'll have to look at the settings too, because images and fonts are controlled by the template settings.

This is the reason why I was considering looking into Asciidoctor's native PDF backend, and check if the previous issues have been solved in the meantime (there were some problems with footnotes at the time, but they might be solved now). Because it's a Gem, it's going to be much easier to use it in the toolchain without headaches. Also, Java has given lot's of problems with fopub so far, with Gradle incompatibility issues (which were finally solved) and more.

tajmone commented 3 years ago

Might Need to Sub-Module FoPub

Having compared your error reports with mine, and your comments on the different behaviours under CygWin and MSYS2, I think that the problem we have here is dual-fold:

  1. Bash vs Windows paths formatting.
  2. Assets look-up paths for:
  3. Configuration files

Probably the former might be fixed somehow, whereas the latter might require adding asciidoctor-fopub as a Git submodule into the repository, so that we can either pass some custom relative paths via command line options, or add some paths to the env $PATH (which we can't do if everyone has located asciidoctor-fopub in an arbitrary folder on his local machine) — but then, it might just be a problem with Bash paths.

Adding asciidoctor-fopub as a Git submodule has some other advantages too, i.e. we can ensure that everyone is using the same exact version, in case the repository is updated (which doesn't happen often though, with the latest commit being from 2018).

The "Image not found" errors seem to be due to Bash vs Windows paths, since I get this error for an image that is actually there:

SEVERE: Image not found. URI: D:/absolute-path-to/asciidoctor-fopub/build/fopub/docbook/images/tip.svg. (No context info available)

As for the config file error, which related to the Git submodule of our template, the paths are also resolved correct, but one is being passed (or just reported?) using the file:// protocol, the second one as a Bash formatted path, the latter formatting being most likely the culprit of the error (the two paths in the message point to the same file):

SEVERE: Cannot read configuration file:///d/absolute-path-to/AlanDocs/alan-docs/_assets/alan-xsl-fopub/xsl-fopub/xslthl-config.xml: \d\absolute-path-to\AlanDocs\alan-docs\_assets\alan-xsl-fopub\xsl-fopub\xslthl-config.xml (The system cannot find the path specified)

It's strange that the file:// protocol fails, since it's universal. Problem the protocol is used only in the error report, whereas Java is trying to locate the config file using the Bash path which is receiving from Bash for Windows.

for which the path looks ok (Windowsy enough) but the db5.ent file is nowhere to be found. Ideas where to look?

It's looking in the wrong path, the db5.ent file is part of the asciidoctor-fopub package, not the Manual source folder — it's located in:

asciidoctor-fopub\build\fopub\db5.ent

Yes, I'm aware that CygWin and MSYS2 do some path sanitation in the background, on the fly, to ensure that paths are handled properly; but Java isn't so smart (another reason in the long list of reasons why I hate Java and its portability myths).

Having to keep dual scripts (batch and shell) is really an unnecessary pain.

I should really check whether the official Asciidoctor PDF backend has been updated and solved all those old problems that were preventing us from using it (a couple of years have passed since). The main problem with switching to Asciidoctor PDF is that we'll need to implement an ALAN syntax for the Rogue highlighter in order to support syntax highlighting — but then, this would solve other issues too, since a Rouge syntax would support call-outs, and can be used in the HTML backend tool.

Rouge is definitely the best highlighter choice since it's in Ruby and supported by all the official Asciidoctor backends. It's similar to Pygments (Python), and it's powerful because it uses a state stack that allows contextual operations on the syntax. I'll look into it, it doesn't seem to hard to do, it's just that I don't know Ruby that well, and I'm not sure if and how a custom syntax can be integrated into Rouge without submitting it to the upstream repository.