daisy / pipeline-mod-braille

!! NOTE: This project is now part of the pipeline-modules project !! | Braille Production Modules for the DAISY Pipeline 2
1 stars 4 forks source link

Image src attribute with backslash crashes #162

Open PaulRambags opened 7 years ago

PaulRambags commented 7 years ago

The DTBook to PEF conversion crashes on an image with a src attribute with a backslash. If the DTBook has, for example,

<img src="img\image003.jpg" alt="afbeelding"/>

The following error occurs on the server, where we run pipeline-assembly and pipeline-webui:

net.sf.saxon.trans.XPathException: Invalid replacement string in replace(): \ character must be followed by \ or $

The following error occurs in our unit test "Image test 1":

Message: [INFO] loading "file:/D:/Repositories/pipeline-mod-dedicon/src/test/resources/xml/images-test-1.xml"
Message: [INFO] loading "file:/D:/Repositories/pipeline-mod-dedicon/target/xprocspec/tmp/xprocspec-20170817095225677/images/output-dir/222222.pef"
err:XC0011:Could not load file:/D:/Repositories/pipeline-mod-dedicon/target/xprocspec/tmp/xprocspec-20170817095225677/images/output-dir/222222.pef (bundle://86.0:1/content/xml/xproc/utils/load.xpl) dtd-validate=false
err:XC0011:Could not load file:/D:/Repositories/pipeline-mod-dedicon/target/xprocspec/tmp/xprocspec-20170817095225677/images/output-dir/222222.pef (bundle://86.0:1/content/xml/xproc/utils/load.xpl) dtd-validate=false

It appears that the xprocspec-20170817095225677/images folder has only a temp-dir and within temp-dir an empty folder temp6280585861570927296.

bertfrees commented 7 years ago

Correct me if I'm wrong, but strictly speaking I think backslashes are not allowed in a valid URL. Have you tried running the DTBook through the dtbook-validator script?

Even if they are not really allowed, supporting backslashes sounds like a good idea because they seem to appear in the wild. For the moment I don't consider this a critical issue though.

By the way, it would be great if the dtbook-to-pef script would have input validation. See https://github.com/daisy/pipeline-scripts/issues/112.

josteinaj commented 7 years ago

I think it is technically allowed, but not as a delimiter, so in the example above, the filename is img\image003.jpg instead of image003.jpg. In Linux I can create such filenames:

➜  touch 'foo\bar'
➜  ls
foo\bar
bertfrees commented 2 years ago

I've checked that a book with such backslashes does not pass the DTBook validator of Pipeline 1. This makes me a bit more confident in the believe that it is indeed invalid in DTBook.