dizzzz / jing-trang

Automatically exported from code.google.com/p/jing-trang
Other
0 stars 0 forks source link

xs:date is never inferred from a sample XML document #104

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
For example trying to obtain an XML Schema from 

<?xml version="1.0" encoding="UTF-8"?>
<sample>  
  <date>2009-10-12</date>
  <date>2009-11-10</date>
</sample>

gives xs:NMTOKEN as the type for the date element instead of xs:date.

The problem is in the DatatypeRepertoire class where we have "NMTOKEN"
before "date" in the types list and a date value will always be matched by
NMTOKEN. 

  static private final String[] typeNames = {
    "boolean",
    // XXX add int?
    "integer",
    "decimal",
    "double",
    "NCName",
    "NMTOKEN",
    "time",
    "date",
    "dateTime",
    "duration",
    "hexBinary",
    "base64Binary",
    "anyURI"
  };

The DatatypeInferrer returns the first type in the list that accepts the
values so the date type will never get the chance to be returned.

Original issue reported on code.google.com by georgebina76 on 25 Feb 2010 at 12:31

GoogleCodeExporter commented 8 years ago
Fixed in r2339. A sample document like

<?xml version="1.0" encoding="UTF-8"?>
<sample>
  <boolean>true</boolean>
  <boolean>true</boolean>
  <boolean>false</boolean>

  <integer>1</integer>
  <integer>2</integer>
  <integer>3</integer>

  <decimal>1.1</decimal>
  <decimal>0.3</decimal>
  <decimal>90</decimal>

  <double>1.23E100</double>
  <double>2</double>
  <double>2.34</double>

  <NCName>test</NCName>
  <NCName>x</NCName>
  <NCName>y</NCName>

  <time>12:23:00</time>
  <time>10:01:11-05:00</time>
  <time>09:01:07Z</time>

  <date>2012-10-09</date>
  <date>2010-02-01</date>
  <date>2009-10-10</date>

  <dateTime>2010-01-20T08:00:10</dateTime>
  <dateTime>2012-02-01T10:00:03-05:00</dateTime>
  <dateTime>2010-01-20T12:00:00Z</dateTime>

  <duration>P364D</duration>
  <duration>P1347Y</duration>
  <duration>-P1347M</duration>

  <!-- we need at least 128 chars for binary types -->

<hexBinary>0FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70
FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70FB70
FB70FB70FB70FB70FB70FB70FB70FB7</hexBinary>
  <hexBinary>AFB7</hexBinary>
  <hexBinary>DF</hexBinary>

  <NMTOKEN>x:a</NMTOKEN>
  <NMTOKEN>_b-c.32</NMTOKEN>
  <NMTOKEN>test</NMTOKEN>

  <!-- we need at least 128 chars for binary types -->
  <base64Binary>
    AABBCCDDEEFFAABBCCDDEEFFAABBCCDDEEFFAABBCCDDEEFF
    AABBCCDDEEFFAABBCCDDEEFFAABBCCDDEEFFAABBCCDDEEFF
    AABBCCDDEEFFAABBCCDDEEFFAABBCCDDEEFFAABBCCDDEEFF
  </base64Binary>
  <base64Binary>AABBCCDDEEFFAA==</base64Binary>
  <base64Binary>AABBCCDDEEAAAAAA</base64Binary>

  <anyURI>http://www.example.com</anyURI>
  <anyURI>test#a10</anyURI>
  <anyURI>ftp://server/path/to/file.xml</anyURI>
</sample>

is converted now to the following RNC schema:

default namespace = ""

start =
  element sample {
    element boolean { xsd:boolean }+,
    element integer { xsd:integer }+,
    element decimal { xsd:decimal }+,
    element double { xsd:double }+,
    element NCName { xsd:NCName }+,
    element time { xsd:time }+,
    element date { xsd:date }+,
    element dateTime { xsd:dateTime }+,
    element duration { xsd:duration }+,
    element hexBinary { xsd:hexBinary }+,
    element NMTOKEN { xsd:NMTOKEN }+,
    element base64Binary { xsd:base64Binary }+,
    element anyURI { xsd:anyURI }+
  }

Original comment by georgebina76 on 25 Feb 2010 at 2:55

GoogleCodeExporter commented 8 years ago
Does this still work when you are generating DTDs? Something that looks like a 
date 
should turn into an NMTOKEN when you are generating a DTD.

Please try to add a test case for every fix.

Original comment by jjc.jclark.com on 25 Feb 2010 at 5:16

GoogleCodeExporter commented 8 years ago
Yes, that depends on the conversion from xs:date to a DTD type from
com.thaiopensource.relaxng.output.dtd.Datatypes and xs:date is mapped to NMTOKEN
there. For example

<sample>
  <date date="2012-10-09"/>
  <date date="2010-02-01"/>
  <date date="2009-10-10"/>
</sample>

gives

<?xml encoding="UTF-8"?>

<!ELEMENT sample (date)+>
<!ATTLIST sample
  xmlns CDATA #FIXED ''>

<!ELEMENT date EMPTY>
<!ATTLIST date
  xmlns CDATA #FIXED ''
  date NMTOKEN #REQUIRED>

Original comment by georgebina76 on 25 Feb 2010 at 7:17

GoogleCodeExporter commented 8 years ago
I added a test case in r2340. It will be great if you can review that.

Original comment by georgebina76 on 25 Feb 2010 at 10:33