fatty- / daisy-pipeline

Automatically exported from code.google.com/p/daisy-pipeline
0 stars 0 forks source link

extend p:load with the ability to ignore DTDs #362

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. use p:load to load an XML file that references a DTD that is unavailable 
(not bundled in the framework, not available at the given URI)

What is the expected output? What do you see instead?
In almost all cases, the end-user doesn't care if the DTD exists at the given 
URI or not. p:load doesn't try to validate it, it just checks if the DTD exists 
(which it is required to according to the XML spec). Could we display a warning 
message instead of throwing a fatal error?

Original issue reported on code.google.com by josteinaj@gmail.com on 19 Aug 2013 at 9:32

GoogleCodeExporter commented 9 years ago
Mmm, loading DTDs is part of the XML parsing rules, I'm not sure we can do much 
here.

Maybe set a "dynamic" URI resolver that loads a dummy (empty) DTD, or a 
pre-processor that removes the doctype in the input stream, but that breaks XML 
compliance (e.g. what if the DTD defines entities that you refer to in the XML 
doc ?).

I'm not sure it's worth the price though (both developing it and asking script 
authors to use the extended step). We have to deal with a handful external 
DTDs, we can rather quickly correct misses (like you did for the NCX DTD). 
Thoughts ?

Original comment by rdeltour@gmail.com on 19 Aug 2013 at 9:49

GoogleCodeExporter commented 9 years ago
Yeah, I don't like that part of the spec :/.

Stripping the doctype is already done in html-load if I remember right, since 
"<!DOCTYPE html>" doesn't work - but thats just a single use-case where we can 
probably live with the current regex that removes the doctype.

If we're not overruling the XML spec (in the name of usability), then at least 
we should provide good error messages when the engine is not able to retrieve 
the DTD referenced in the doctype. By default, p:load throws a XD0011 (not well 
formed) exception when the DTD is not available; ideally I think we should be 
more specific about the missing DTD.

As for px:fileset-load; it can't tell from XD0011 whether the XML it tried 
loading actually was a text file or if it's just the DTD that is missing (the 
document otherwise being well-formed). So I guess it will have to output a 
not-well-formed warning every time it encounters XD0011.

Original comment by josteinaj@gmail.com on 19 Aug 2013 at 10:15