eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
421 stars 179 forks source link

parse-xml() and util:parse() do not seem to use catalogs #1975

Open xatapult opened 6 years ago

xatapult commented 6 years ago

What is the problem

We're trying to load an XML file with a PUBLIC DTD ref in it using file:read and then turn this into XML using parse-xml() or util:parse(). Both report that they can't find the DTD (sorry for the Dutch in the message):

exerr:EXXQDY0002 Error while parsing XML: c:\eXist-db-4.2.1\dummy.dtd (Het systeem kan het opgegeven bestand niet vinden) [at line 7, column 3]

To fix this I tried to add a catalog file to eXist in conf.xml in /exist/validation/entity-resolver: <catalog uri="file:///C:/xdata2/jb-temp/dtd-catalog-test/dummy-catalog.xml"/>

This does not work.

To check whether the catalog was ok, I added it to my oXygen preferences. Without the catalog oXygen couldn't validate the file, with the catalog oXygen could. So I assume the catalog itself is ok.

What did you expect

That parse-xml()/util:parse() would use the catalog files configured in conf.xml OR that there was some other way to make them catalog-aware OR you can turn off the DTD validation all together (although I know that is controversial due to entity resolving).

Describe how to reproduce or add a test

I added a zip file.

Context information

Please always add the following information

Zip file:

dtd-catalog-test.zip

dizzzz commented 6 years ago

Actually we only implemented the validation with catalog support on two levels:

  1. during XML parsing when the XML is inserted into the database
  2. using the rich set of functions of the validation extensions

So for the short term there are you could use the jaxp-parse() extension function.

for the longer term..... util:parse() is deprecated and adding the catalog feature for the parse-xml() would be probably a good idea BUT... as far as I remember you need to restart exist-db every time when you update the catalog.xml file on the filesystem.... the jaxp-parse() is more flexible here.

xatapult commented 6 years ago

All right, that might be a workaround, and nice that you can add catalogs as an argument. Yes, restarting eXist is necessary, it tells you in the log about the catalogs. BUT: When such a file isn't there it doesn't complain :-(

adamretter commented 6 years ago

@dizzzz I think we should add validation to fn:parse-xml (and possibly fn:parse-xml-fragment) so that they are inline with storing into the db. Any idea how much work that would be?

dizzzz commented 6 years ago

I will have a look at the current code, but I think I remember there is quite some plumbing involved with the catalogs, and I do not like the end result. A long time I started a kind of re-implementation of the catalog resolver, but this got never finished. Ideally I'd like to store a catalog document inside the database.

@ndw 's work on a xmlresolver might be a good start.

In all I guess it will take some time.

ndw commented 6 years ago

I'll help if I can. I'd certainly prefer to see the xmlresolver work I've done extended or improved so that it's useful to eXist rather than having different implementations.

P.S. https://github.com/ndw/xmlresolver