XML Resolution Service

Current Status

This service has been replaced by the resolver service at https://github.com/daitss/resolver. It's now deprecated.

OverView

Consider a collection of XML documents. You would like to gather up all of the schemas necessary to understand those documents for preservation purposes. This web service helps you do that, in three RESTful steps:

Create a collection resource.
POST some XML documents to the collection.
GET the collection, retrieving a tar file of the XML schemas and a manifest of what was found.

The original XML documents are not returned. It is recommended that you employ a caching proxy such as squid when used in a production environment.

Environment

In your web server you should set up some environment variables:

SetEnv DATA_ROOT - where you'll save information about the schemas and document collections.
SetEnv RESOLVER_PROXY squid.example.com:3128 - an optional squid caching proxy.
SetEnv LOG_FACILITY LOG_LOCAL1 - optionally set a facility code if you use syslog for logging; STDERR will be used otherwise.

Requirements

ruby 1.9.3 - use master branch
ruby 1.8.7 - use ruby1.8.7 branch
sinatra & rack
nokogiri, libxml-ruby & builder
rake, rspec, cucumber & ci/reporter for testing.
log4r
capistrano & railsless-deploy.

Quickstart

Retrieve a copy of the xmlresolution service.
Test the installation: % rake spec
Run from rackup, specifying your environment: % RESOLVER_PROXY=squid.example.com:3128 rackup config.ru or run under a web server. I'm using passenger phusion under apache:

ServerName xmlresolution.example.com DocumentRoot "/.../xmlresolution/public" SetEnv DATA_ROOT /var/resolutions SetEnv RESOLVER_PROXY squid.example.com:3128 SetEnv LOG_FACILITY LOG_LOCAL2 Order allow,deny Allow from all ` Directory Structure ------------------- You can use the supplied Capfile to set up. Adjust the top few lines in that file to match your installation. * config.ru & app.rb - the Sinatra setup * public/ - programming docs will land in public/internals here via % rake yard; otherwise empty * views/ - instructional erb pages and forms * lib/app/ - root of the sinatra stuff - helpers and routes * lib/xmlresolution/ - root of the xmlresolution libraries * spec/ - tests * data/ - example DATA_ROOT which must have the directories: * data/schemas - where cached schemas live * data/collections - where collections, and information about submitted documents for a collection, live * tmp/ - phusion checks the restart.txt file here. Rake has a restart target for this, capistrano uses it Usage ----- The following assumes you've a running server at xmlresolution.example.com. There are built-in test forms for exploring the system; see http://xmlresolution.example.com/ for instructions. The following models how your RESTful clients should access the service. * Create a collection (some versions of curl require you to use an empty document here): `curl --upload-file /dev/null -X PUT http://xmlresolution.example.com/ieids/collection-001` * Submit some XML documents to it (note trailing slash): `curl -F xmlfile=@myfile.xml http://xmlresolution.example.com/ieids/collection-001/` `curl -F xmlfile=@myotherfile.xml http://xmlresolution.example.com/ieids/collection-001/` * Get the tarfile of the associated schemas and a manifest `curl http://xmlresolution.example.com/ieids/collection-001/` Documentation ------------- See the root of the running service for a web page of instructions on use and testing; there is a Rake task that will install the application documentation under public/internals.

fcla / xmlresolution