Conal-Tuohy / TroveProxy

A transforming proxy and harvester for the National Library of Australia's Trove API
Apache License 2.0
2 stars 0 forks source link

`@next` URLs in multi-category search results are broken #10

Open Conal-Tuohy opened 1 year ago

Conal-Tuohy commented 1 year ago

At the moment I've written code in TroveProxy to fix these broken URLs, and it seems to me that the Trove API actually could incorporate this same fix, so I'd like to be able to move this fix "upstream" to Trove, so that other people who aren't using TroveProxy don't experience these broken links.

The XML response includes <records> elements with next attributes whose values are URLs which include a single category parameter whose value is a list of category names, separated by a (URL-encoded) comma. If I replace that category parameter with one whose value is taken from the code attribute of the parent (i.e. <category>) element of the <records>element, then the resulting URL does work.

e.g. take the following query URL: https://api.trove.nla.gov.au/v3/result?category=book&category=newspaper&q=water%20dragon&s=*&n=1&bulkHarvest=true

The result looks like this:

<response>
  <query>water dragon</query>
  <category code="book" name="Books &amp; Libraries">
    <records s="*" n="1" total="3700" next="https://api.trove.nla.gov.au/v3/result?category=book%2Cnewspaper&amp;q=water+dragon&amp;n=1&amp;bulkHarvest=true&amp;s=AoEqc3UxMDAwMTE4Ng%3D%3D" nextStart="AoEqc3UxMDAwMTE4Ng==">
      <!-- omitted for brevity -->
    </records>
  </category>
  <category code="newspaper" name="Newspapers &amp; Gazettes">
    <records s="*" n="1" total="98345" next="https://api.trove.nla.gov.au/v3/result?category=book%2Cnewspaper&amp;q=water+dragon&amp;n=1&amp;bulkHarvest=true&amp;s=AoEpMTAwMDI0NDE5" nextStart="AoEpMTAwMDI0NDE5">
      <!-- omitted for brevity -->
    </records>
  </category>
</response>

Those "next" URLs are broken, but if I change them like so, they do appear to work correctly:

<response>
  <query>water dragon</query>
  <category code="book" name="Books &amp; Libraries">
    <records s="*" n="1" total="3700" next="https://api.trove.nla.gov.au/v3/result?category=book&amp;q=water+dragon&amp;n=1&amp;bulkHarvest=true&amp;s=AoEqc3UxMDAwMTE4Ng%3D%3D" nextStart="AoEqc3UxMDAwMTE4Ng==">
      <!-- omitted for brevity -->
    </records>
  </category>
  <category code="newspaper" name="Newspapers &amp; Gazettes">
    <records s="*" n="1" total="98345" next="https://api.trove.nla.gov.au/v3/result?category=newspaper&amp;q=water+dragon&amp;n=1&amp;bulkHarvest=true&amp;s=AoEpMTAwMDI0NDE5" nextStart="AoEpMTAwMDI0NDE5">
      <!-- omitted for brevity -->
    </records>
  </category>
</response>

The code I'm using to fix these broken URLs is here: https://github.com/Conal-Tuohy/TroveProxy/blob/0778f71bd4bf7146023e0972d3c08b7d4c2d16cc/src/xslt/fix-trove-response.xsl#L4 https://github.com/Conal-Tuohy/TroveProxy/blob/0778f71bd4bf7146023e0972d3c08b7d4c2d16cc/src/xslt/fix-trove-response.xsl#L13-L23

Conal-Tuohy commented 1 year ago

Followed up on this with Trove Support this morning