fcla / xmlresolution

A web service that recursively finds schemas associated with an XML file
daitss/xmlresoution
GNU General Public License v3.0
4 stars 1 forks source link

namespace in manifest for SYSTEM dtd #29

Closed lydiam closed 11 years ago

lydiam commented 11 years ago

I see that XMLresolution creates the manifest.xml as follows for a SYSTEM dtd, including a "namespace", when in fact a SYSTEM dtd is only required to have a URI/"schemalocation":

 -<resolution time="2012-08-30T10:28:26-04:00" name="file://ripple.fcla.edu/E8LE8ZVFB_SR56RR/redir-rss-system.xml" md5="3f7cbf823a1dca73180378fca02880c3"> 
<schema md5="f15a7d50eebe0ccffce8f56d78f2aaff" last_modified="2012-07-19T17:46:47-04:00" namespace="DOCTYPE.rss.SYSTEM.http://iterman.fcla.edu/rss-0.91.dtd" location="http://schema.fcla.edu/xml/rss-0.91.dtd" status="success"/> 
<schema namespace="DOCTYPE.rss.SYSTEM.http://iterman.fcla.edu/rss-0.91.dtd" location="http://iterman.fcla.edu/rss-0.91.dtd" status="redirect" actual="http://schema.fcla.edu/xml/rss-0.91.dtd"/> 
</resolution> 
</resolutions>

Wouldn't it be more proper for the manifest.xml to read as follows, removing the namespace:

 -<resolution time="2012-08-30T10:28:26-04:00" name="file://ripple.fcla.edu/E8LE8ZVFB_SR56RR/redir-rss-system.xml" md5="3f7cbf823a1dca73180378fca02880c3"> 
<schema md5="f15a7d50eebe0ccffce8f56d78f2aaff" last_modified="2012-07-19T17:46:47-04:00"  location="http://schema.fcla.edu/xml/rss-0.91.dtd" status="success"/> 
<schema location="http://iterman.fcla.edu/rss-0.91.dtd" status="redirect" actual="http://schema.fcla.edu/xml/rss-0.91.dtd"/> 
</resolution> 
</resolutions>

This example is taken from TestCase1 of the xmlresolution test cases.

cchou commented 11 years ago

PC's decision: remove namespace in this case unless it creates another problem. We should change "schema" to "dtd", and "stylesheet", where appropriate.

lydiam commented 11 years ago

This change works correctly now for TestCase1:

 <resolution md5="53f00d5023e7f7fa89ccc118414c89d5" name="file://ripple.fcla.edu/E8LP2040P_AMKKG3/redir-rss-system.xml" time="2012-11-01T10:43:41-04:00">
     <dtd status="failure" location="http://localhost/rss-0.91.dtd" message="404 &quot;Not Found&quot;"/>
   </resolution>
 </resolutions>

However, in TestCase11 which contains a reference to a PUBLIC dtd the namespace has also been removed:

 <resolution md5="864b70f7b595fc2cbde7d09a0ea5f3b4" name="file://ripple.fcla.edu/E1LDJJAIX_L4EP39/redir-rss-public.xml" time="2012-11-01T14:21:26-04:00">
<dtd status="success" location="http://schema.fcla.edu/xml/rss-0.91.dtd" md5="f15a7d50eebe0ccffce8f56d78f2aaff" last_modified="2012-07-19T17:46:47-04:00"/>
<dtd status="redirect" location="http://iterman.fcla.edu/rss-0.91.dtd" actual="http://schema.fcla.edu/xml/rss-0.91.dtd"/>

Before this latest code change the manifest read:

  <resolution md5="93a0c52102185bf63e5f4271192da812" name="file://ripple.fcla.edu/EPBEREB7K_GKO2S0/redir-rss-public.xml" time="2012-10-16T16:36:20-04:00">
    <schema status="success" location="http://schema.fcla.edu/xml/rss-0.91.dtd" namespace="DOCTYPE.rss.PUBLIC.-//Netscape Communications//DTD RSS 0.91//EN" md5="f15a7d50eebe0ccffce8f56d78f2aaff" last_modified="2012-07-19T17:46:47-04:00"/>
    <schema status="redirect" location="http://iterman.fcla.edu/rss-0.91.dtd" namespace="DOCTYPE.rss.PUBLIC.-//Netscape Communications//DTD RSS 0.91//EN" actual="http://schema.fcla.edu/xml/rss-0.91.dtd"/>
  </resolution>
  </resolutions>

A snippet of the xml file containing the reference to this dtd:

 <?xml version="1.0"?>
     <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://iterman.fcla.edu/rss-0.91.dtd">
<rss version="0.91">
iterman commented 11 years ago

A "PUBLIC" dtd has the form:

<!DOCTYPE root_element PUBLIC "DTD_name" "DTD_location">.

Where the "DTD_location" is supposed to be used when the "DTD_name" cannot be found in the local catalog or the local catalog server is not implemented. In our case we don't have a local catalog server. I think they are rarely implemented. I think the DTD_name is analogous to a namespace but there is a difference. My suggestion is instead of attribute namespace= we try PUBLIC=. This is a bit more accurate. So for
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://iterman.fcla.edu/rss-0.91.dtd">

we create in the manifest.xml

iterman commented 11 years ago

my previous comment got truncated. So here it is we create in the manifest.xml

lydiam commented 11 years ago

Did this one get truncated as well?

On 11/2/2012 11:17 AM, iterman wrote:

my previous comment got truncated. So here it is we create in the manifest.xml

— Reply to this email directly or view it on GitHub https://github.com/daitss/xmlresolution/issues/29#issuecomment-10017557.

iterman commented 11 years ago

yes the second one got truncated as well. It w

question will the following also get truncated??

a="A" b="B"
iterman commented 11 years ago

On 11/2/12 11:22 AM, Lydia Motyka wrote:

Did this one get truncated as well?

On 11/2/2012 11:17 AM, iterman wrote:

my previous comment got truncated. So here it is we create in the manifest.xml

— Reply to this email directly or view it on GitHub

https://github.com/daitss/xmlresolution/issues/29#issuecomment-10017557.

— Reply to this email directly or view it on GitHub https://github.com/daitss/xmlresolution/issues/29#issuecomment-10017742.

Lydia,

Apparently the second one got truncated also!! It was supposed to be:

||

I ran a little test on Github. I think the truncation may have something to do with the string just above being processed other than a string. Maybe it should have been encased in single quotes.

lydiam commented 11 years ago

One question/comment: while the content appears to be correct now for PUBLIC and SYSTEM dtds, the order of the attributes is not consistent with the attributes for schema entries (note that PUBLIC is the last attribute, while namespace is always listed after location. Different browsers also seem to modify the display order):

 <?xml version="1.0" encoding="UTF-8"?>
     <resolutions collection="EW1J0B087_ZN7YM0">
       <resolution md5="059e32ed8430296454b7d9fe1b723deb" name="file://ripple.fcla.edu/EW1J0B087_ZN7YM0/rss-public.xml" time="2012-11-05T10:47:48-05:00">
         <dtd status="success" location="http://schema.fcla.edu/xml/rss-0.91.dtd" md5="b0669e2f96127caca02f804ed3c28b4e" last_modified="2012-07-19T17:46:47-04:00" PUBLIC="-//Netscape Communications//DTD RSS 0.91//EN"/>
      </resolution>

      <resolution md5="e96b9943d9209ae96f88bc71a66787f4" name="file://ripple.fcla.edu/EW1J0B087_ZN7YM0/TestCase14.xml" time="2012-11-05T10:47:48-05:00">
     <schema status="success" location="http://dublincore.org/schemas/xmls/simpledc20021212.xsd" namespace="http://purl.org/dc/elements/1.1/" md5="d90774b02fa694f3b358b4ed828295be" last_modified="2012-11-05T10:47:48-05:00"/>
    <schema status="success" location="http://www.fcla.edu/dls/md/daitss/daitss.xsd" namespace="http://www.fcla.edu/dls/md/daitss/" md5="2b2f6040cfc603d5873d7fa0bf976274" last_modified="2012-05-30T14:05:46-04:00"/>
    <schema status="success" location="http://www.loc.gov/standards/mets/mets.xsd" namespace="http://www.loc.gov/METS/" md5="42519c72a741cc30e256b99369f1d735" last_modified="2012-03-05T12:02:18-05:00"/>
    <schema status="success" location="http://www.loc.gov/standards/xlink/xlink.xsd" namespace="http://www.w3.org/1999/xlink" md5="01490ebdea13c1bc82a17e4783daeeaa" last_modified="2007-08-23T15:02:01-04:00"/>
    <schema status="success" location="http://www.w3.org/2001/03/xml.xsd" namespace="http://www.w3.org/XML/1998/namespace" md5="712fc5a7750e69f904f61086a997713c" last_modified="2004-03-31T12:57:18-05:00"/>
   <schema status="success" location="http://www.w3.org/2001/xml.xsd" namespace="http://www.w3.org/XML/1998/namespace" md5="5e0bd6f94ec78a3a88fca2275ab05f9e" last_modified="2009-01-21T17:06:40-05:00"/>
   <schema status="success" location="http://www.w3.org/2001/XMLSchema.xsd" namespace="http://www.w3.org/2001/XMLSchema" md5="1fadeaf88d4b93ab263f7c59917c26bc" last_modified="2004-03-20T07:53:09-05:00"/>
   </resolution>
  </resolutions>

It would be best to output all of the attributes in the same order for each document type.

The above example is from TestCase14

iterman commented 11 years ago

The order that our DAITSS application code writes the attributes is always the same: status, location, namespace or PUBLIC, md5, last_modified. Ruby builder code is determining the order of the attributes. With respect to XML documents the order of attributes within a element is not important. The order would come into play if there was a downstream program that was written to expect a certain ordering - not our case.

The Ruby forum shows that another person raised this issue and there is some overriding code to a Ruby Gem that would sort the attributes. Please see: http://www.ruby-forum.com/topic/111043.

I think will should let this stay as is.

pcaplan commented 11 years ago

I agree.

Priscilla

On 11/5/2012 11:48 AM, iterman wrote:

The order that our DAITSS application code writes the attributes is always the same: status, location, namespace or PUBLIC, md5, last_modified. Ruby builder code is determining the order of the attributes. With respect to XML documents the order of attributes within a element is not important. The order would come into play if there was a downstream program that was written to expect a certain ordering - not our case.

The Ruby forum shows that another person raised this issue and there is some overriding code to a Ruby Gem that would sort the attributes. Please see: http://www.ruby-forum.com/topic/111043.

I think will should let this stay as is.

— Reply to this email directly or view it on GitHub https://github.com/daitss/xmlresolution/issues/29#issuecomment-10077890.

Priscilla Caplan Assistant Director for Digital Library Services Florida Virtual Campus 5830 NW 39th Avenue Gainesville, FL 32606 (352) 392-9020 x324 (352) 392-9185 (fax)

lydiam commented 11 years ago

I actually advised Ira not to make any further code changes and just to document the issue.