Swirrl / datahost-prototypes

Eclipse Public License 1.0
0 stars 0 forks source link

/data and /data/:series route should return dh:hasRelease summary for releases #369

Open RickMoynihan opened 7 months ago

RickMoynihan commented 7 months ago

We should adjust the granularity of requests a little, so you can follow your nose when walking data in the API.

Currently if you hit /data or /data/:series-slug you know nothing of the releases within it:

curl -X 'GET' \
  'https://dluhc-pmd5-prototype.publishmydata.com/data' \
  -H 'accept: application/ld+json'
{
  "contents": [
    {
      "dcterms:title": "Permanent dwellings completed, England, District By Tenure",
      "@type": "dh:DatasetSeries",
      "dcterms:modified": "2024-02-01T14:48:16.422002278Z",
      "dcterms:issued": "2024-02-01T14:48:16.422002278Z",
      "dcterms:description": "House building data are collected at local authority district level, but it is important to treat figures at this level with care. House building is unevenly distributed both geographically and over time and patterns of housing development can produce clusters of new homes which make the figures at a low geographic level volatile and difficult to interpret. For detailed definitions of all tenures, see definitions of housing terms on Housing Statistics The district level and county figures are as reported by local authorities and the NHBC. Where a local authority has not submitted a quarterly return to DCLG, no figure has been presented for this local authority (and when relevant its county) for any 12-month period that includes the missing quarter. England total figures include estimates for missing data returns from independent Approved Inspectors and Local Authorities, so the sum of district values may be slightly less than the England totals. *House building completion* – In principle, a dwelling is regarded as complete when it becomes ready for occupation or when a completion certificate is issued whether it is in fact occupied or not. In practice, the reporting of some completions may be delayed and some completions may be missed if no completion certificate was requested by the developer or owner, although this is unusual. *Tenure* – For the purposes of these statistics, the term tenure refers to the nature of the organisation responsible for the development of a new housing start or completion. It does not necessarily describe the terms of occupancy for the dwelling on completion.",
      "dh:baseEntity": "https://ldapi-prototype.gss-data.org.uk/data/Permanent-dwellings-completed",
      "@id": "Permanent-dwellings-completed"
    }
...]}

It would be helpful if these documents contained the property dh:hasRelease with a subset of properties relevant to each nested release entity (it should be a minimum of dcterms:title and @id):

"dh:hasRelease: [{"@id":"2019", "dcterms:title": "2019"},,,,]

However the above isn't quite sufficient because the URI's for those @id's will come out wrong. Instead we need to do something like the following to fix https://github.com/Swirrl/datahost-prototypes/issues/349:

{
  "@context": {
    "@base": "https://services-base-url-goes-here.org/data/" ;; 1
  },
  "contents": [
    {
      // ...
      "@id": "English-Indices-of-Deprivation",
      "dh:hasRelease": {
        "@context":{"@base": "./English-Indices-of-Deprivation/release/"}, ;; 2
        "@set": [{"@id":"2019", "dcterms:title": "2019"}]}
    }
  ]
}

The @context's at 1 and 2 ensure within their respective scopes that the @id of each dataset in contents follows the form https://services-base-url-goes-here.org/data/:id, whilst each release when expanded will have the URI https://services-base-url-goes-here.org/data/:series-id/release/:release-id.

This also balances the requirement that @id's are relative to your position in the graph/tree, so you can feed them into the API without having to parse them.

Unfortunately it does trade off having a deeper path to access the release data contents -> dh:hasRelease -> @set with the less descriptive @set key.

(From a UX perspective a nicer alternative would be to dynamically generate a portion of the @context itself. This would mean having a dynamically generated context per entity (series/release/revision) which would include the @id slugs in the paths; the static portion of the context could be held in a separate static context document, and we could import/cascade the contexts appropriately. This would allow the more succinct syntax whilst ensuring the IRI expansion was the same.)

xdrcft8000 commented 6 months ago

I'm not sure if I've misinterpreted what's required but here's where I got to

RickMoynihan commented 6 months ago

The issue we're really talking about here is https://github.com/Swirrl/datahost-prototypes/issues/349.

Thanks for this @xdrcft8000 it's a really helpful step in the right direction.

I think we need to refine it further into something like this though

The important thing I'm trying to do is separate the @context into a static and dynamic part:

{
  "@context": [
     # static part
    "https://cdn.jsdelivr.net/gh/Swirrl/datahost-prototypes@1282114/datahost-ld-openapi/resources/jsonld-context.json",
    {
    # dynamic part
    "@base": "https://dluhc-pmd5-prototype.publishmydata.com/data/",
    "dh": "http://example.org/vocab#",
    "dh:hasRelease": {
      "@context": {
        "@base": "English-Indices-of-Deprivation/release/"
      }
    }}],
  "@id": "English-Indices-of-Deprivation",
  "@type": "dh:DatasetSeries",
  "dh:hasRelease": [
    {
      "@id": "2019",
      "dcterms:title": "2019"
    },
    {
      "@id": "2020",
      "dcterms:title": "2020"
    }
  ]
}

The static part is probably the bulk of our vocabulary JSON/LD context, we should try and keep that a dumb flat file as much as we can.

The dynamic part however we'll need to programatically inject into the documents the application renders, as the dataset series slug forms part of that path.

andrewmcveigh commented 6 months ago

So, I thought about doing what you're suggesting there @RickMoynihan but I don't think it works for this issue correctly. It probably can for /data/:series but it cannot for /data, as there can be more than one series so the @base in the context will only be correct when there is only one series.

Same issue (ish) for #370

andrewmcveigh commented 6 months ago

so, I think we can produce this

{
  "contents": [
    {
      "dcterms:modified": "2024-02-28T15:13:30.592145398Z",
      "dcterms:description": "A very simple test",
      "dcterms:issued": "2024-02-28T15:13:30.592145398Z",
      "@index": "https://example.org/data/differentdummy1709133210",
      "dh:baseEntity": "https://example.org/data/differentdummy1709133210",
      "@id": "differentdummy1709133210",
      "dh:hasRelease": [
        {
          "dcterms:title": "Test Release",
          "@type": "dh:Release",
          "@id": "release-1",
          "@context": {"@base": "./differentdummy1709133210/release/"}

        }
      ],
      "@type": "dh:DatasetSeries",
      "dcterms:title": "Test Dataset"
    },
    {
      "dcterms:modified": "2024-02-28T15:13:30.214140032Z",
      "dcterms:description": "A very simple test",
      "dcterms:issued": "2024-02-28T15:13:30.214140032Z",
      "@index": "https://example.org/data/dummy1709133210",
      "dh:baseEntity": "https://example.org/data/dummy1709133210",
      "@id": "dummy1709133210",
      "dh:hasRelease": {
        "@context": {"@base": "./dummy1709133210/release/"},
        "@set": [
          {
            "dcterms:title": "Test Release",
            "@type": "dh:Release",
            "@id": "release-1"
          }
        ]
      },
      "@type": "dh:DatasetSeries",
      "dcterms:title": "Test Dataset"
    }
  ],
  "@context": [
    "https://cdn.jsdelivr.net/gh/Swirrl/datahost-prototypes@1282114/datahost-ld-openapi/resources/jsonld-context.json",
    {
      "@base": "https://example.org/data/",
      "dh:hasRelease": {
        "@container": "@set",
        "@id": "dh:hasRelease"
      }
    }
  ]
}

Which appears to work (playground)

The issue is that we need to add the "@context" and "@set" stuff after compaction, which is a bit of a PITA

RickMoynihan commented 6 months ago

hmmm ok good point!

I think in that case we should descope doing it for /data and only do it for /data/:series then, as I think it's better if the extra cruft doesn't affect the structure of the data