dnewcome / jath

Jath is a simple template language for parsing xml using json markup.
MIT License
68 stars 15 forks source link

Having trouble gettings started with namespaces #5

Closed matthewrobertson closed 12 years ago

matthewrobertson commented 12 years ago

Hi,

I want to thank you for taking the time to write such an awesome looking library. Unfortunately I am having a bit of trouble getting started. I am trying to build an ePUB viewer in javascript. ePUB documents contain a lot of xml manifest information and a nice standardized way for me to convert this to JSON would really help the project along. Below is an example of a manifest pulled from a real world ePUB

<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="ean" version="2.0">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://www.idpf.org/2007/opf">
      <dc:title>L'espagnol dans votre poche</dc:title>
      <dc:creator></dc:creator>
      <dc:publisher>Larousse</dc:publisher>
      <dc:rights>© Éditions Larousse, </dc:rights>
      <dc:identifier id="ean">9782035862464</dc:identifier>
      <dc:language>fr</dc:language>
  </metadata>
  <manifest>
    <item id="css" href="styles.css" media-type="text/css" />
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
    <item id="cover" href="images/cover.jpg" media-type="image/jpeg"/>
    <item id="font2" href="Fonts/HighlanderStd-Medium.otf" media-type="application/x-font-otf"/>
    <item id="Page_1"  href="Page_1.html"  media-type="application/xhtml+xml"/>
    <item id="Page_1_jpg" href="images/Page_1.jpg"  media-type="image/jpeg"/>
    <item id="Page_2"  href="Page_2.html"  media-type="application/xhtml+xml"/>
    <item id="Page_2_jpg" href="images/Page_2.jpg"  media-type="image/jpeg"/>
  </manifest>

  <spine toc="ncx">
    <itemref idref="Page_1"/>
    <itemref idref="Page_2"/>
  </spine>
  <guide>
  </guide>
</package>

For now lets say I am only interested in the itemhref items under the spine node. What would be the right way to pull them out. I have tried multiple things but the only way I can get it work it by ripping out all the namespaces from the document. Is this a bug in Jath or am I missing something?

BTW sorry for bugging you with such a n00b plea for help but if you can help me I would be happy to compensate by contributing a bit of documentation on how this namespace resolving thing works in your wiki.

dnewcome commented 12 years ago

XPath doesn't support the idea of a default namespace, so you have to map the url of your default namespace to some binding or use the local-name() function in order to refer to elements within the that namespace.

There are some examples of this if you look in samples.html. Probably the 'correct' way to approach it is to give Jath a resolver function:

Jath.resolver = function( prefix ) {
    var mappings = { def: "http://www.idpf.org/2007/opf" };
    return mappings[ prefix ];
}

Then use something like this as the template:

var template = [ "//def:itemref", { idref: "@idref" } ];

Or you could alter the xpath selectors to use local-name():

var template = [ "//*[local-name()='itemref']", { idref: "@idref" } ];

I haven't tested these code bits. I might be able to do something a little later though.

matthewrobertson commented 12 years ago

Ok so obviously the problem is not Jath but my poor of knowledge of XPath I managed to have a bit of luck using the following template:

var template0 = {
  manifest: [ "//*[namespace-uri()='http://www.idpf.org/2007/opf' and local-name()='item']", { 
    id: "@id",
    href: "@href",
    media_type: "@media-type"
  } ],                           
  spine: [ "//*[namespace-uri()='http://www.idpf.org/2007/opf' and local-name()='itemref']", { 
    idref: "@idref" 
  } ],
};

ideally I would like to tidy this up by passing Jath a resolver, I tried a few things but no luck so far. Here is one thing I thought would work:

Jath.resolver = packageDoc.createNSResolver(packageDoc)

and then use this as my template:

var template0 = {
  manifest: [ "//item", { 
    id: "@id",
    href: "@href",
    media_type: "@media-type"
  } ],                           
  spine: [ "//itemref'", { 
    idref: "@idref" 
  } ],
};

Does this not make sense?

matthewrobertson commented 12 years ago

By the way, thanks for your help!

dnewcome commented 12 years ago

No prob. The reason your createNSResolver isn't working is because (ironically) only bound namespaces will be resolved. So probably your "dc:" namespaces will resolve this way but not the elements in the default namespace.

matthewrobertson commented 12 years ago

Ok looks like I am all sorted now. I get it, I missed the prefix part of the template you suggested. Once I added it in it is working perfectly :)

dnewcome commented 12 years ago

cool