Swirrl / ook

Structural search engine
https://search-prototype.gss-data.org.uk/
Eclipse Public License 1.0
6 stars 0 forks source link

Show more dataset info #18

Closed kiramclean closed 3 years ago

kiramclean commented 3 years ago

Right now selecting some codes just fetches the dataset id and the count of observations that match. We should also fetch the labels, descriptions, or whatever other information is relevant. Also generate the right urls so the "View data" buttons actually link to PMD.

Robsteranium commented 3 years ago

I think we need at least:

@benjystanton was wondering about publication date too. Indeed something like "latest reference period" (i.e. date of last observation) might be really useful for comparison (which is most up-to-date?). We can possibly wait for the coverage metadata.

benjystanton commented 3 years ago

Yeah I think publication date is really useful. Users need to know "is this the latest data?" so we often need a few bits of information to help understand that. E.g. If something was published 11 months ago but it's only published annually then it's still the latest data. So anything we can to help them answer this question would be great.

Robsteranium commented 3 years ago

We've since added description and today I've added publisher label and altlabel (typically has the acronym).

Robsteranium commented 3 years ago

The question "is this the latest data?" is tricky to answer!

  1. Upstream publication lags behind reality - i.e. it can take more than a year to collect, analyse and publish data - indeed you can't get data about a whole year until it's over!
  2. Some publications lag further behind than others - ONS trade data is derived from HMRC data so it necessarily lags further behind.
  3. The cubes on IDP lag behind publications - because it takes some time to write and check the transformation.
  4. The OOK index lags behind IDP - because we have to run this manually at the moment (until we implement #17).

We should aim to show the publication-dates and observation-ref-dates side-by-side. We can reveal this once the upstream dataset tracking (https://github.com/Swirrl/cogs-issues/issues/35) and coverage metadata (https://github.com/Swirrl/cogs-issues/issues/92) are loaded.

This will be really useful for OOK as we can show how robustness comes at the expense of recency - a key dataset-choosing criterion.

We could load the metadata showing when the cubes were modified on IDP but I think it would be misleading (just because the cube was re-loaded, it doesn't follow that the data itself was any newer).

I suggest we leave this issue open until those upstream ones are resolved.

kiramclean commented 3 years ago

Publisher is now shown in the dataset results table since https://github.com/Swirrl/ook/commit/962f0875d27ecc75ccca09244ac11dc109f0eb9c and the links to PMD work now, so I'm going to close this in favour of #61 since the original problems this issue describes are fixed now and the remaining things we'd like to add depend on upstream issues being resolved first.