emory-libraries / eulxml

Utilities for using XPath to map XML data to Python objects and Django forms
http://eulxml.readthedocs.org
38 stars 12 forks source link

Use xml catalog for loading schemas #18

Open rlskoeser opened 8 years ago

rlskoeser commented 8 years ago

We should update the schema loading in eulxml so it's not dependent on external resources that may not be available all the time.

lxml has support for xml catalogs via libxml2; see http://lxml.de/resolvers.html and the referenced instructions for setting up an xml catalog http://xmlsoft.org/catalog.html

I've already tested this with eulxml in proof-of-concept spike code, and it works great. Here's what I suggest we do:

As a way of testing that the resolver is working properly, you can modify the local schema files, and then load them through eulxml and confirm that your modification is present. I suppose you might also be able to test by validating local xml without network connectivity.

It actually would probably be a good idea to automatically add a comment to the copies of the schemas that we download and save when we generate the catalog - e.g., "downloaded by eulxml on [date]".

Here's a sample xml catalog file that I created in my testing, in case it's useful:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<uri
  name="http://www.loc.gov/standards/mods/v3/mods-3-4.xsd"
  uri="file:///tmp/mods-3-4.xsd.xml" />
<uri
  name="http://www.loc.gov/standards/mods/mods.xsd"
  uri="file:///tmp/mods-3-4.xsd.xml" />
<uri
  name="http://www.loc.gov/standards/xlink/xlink.xsd"
  uri="file:///tmp/xlink.xsd" />
</catalog>
alexBLR commented 8 years ago

We opened a new feature branch to address this issue: https://github.com/emory-libraries/eulxml/tree/feature/loading_schemas