digital-land / submit

0 stars 1 forks source link

Dashboard: Fetch data from all active resources #681

Open GeorgeGoodall-GovUk opened 1 day ago

GeorgeGoodall-GovUk commented 1 day ago

Background

The data on the Overview page is currently only fetching data from the latest resource on the latest endpoint. ideally we want to consider all active resources.

hiccup 1

datasets with multiple endpoints tend to have their data provided on the endpoints in one of two ways:

  1. each new endpoint is an updated version of an older endpoint, in essence the most recent endpoint should have all the data for all the entities on it
  2. endpoints supply data that is independent of one another, so for example endpoint 1 has the first 50 entities, while endpoint 2 has entities 51 to 100

This is problematic because in the 1st situation we don't care about older endpoints but in the 2nd situation we do.

How do we solve this

we should only show the most recent outstanding issues for any entity.

How do we implement this

Currently the database has no concept of a 'most recent outstanding issue' so we need to get clever and work this out ourselves ** Can we ask infa to add a resource date into the issue field? or even a date into the dataset.resource table

To do this we should first get a list of the most recent issues for each entity by either...

Querying the database directly using something like this

This gets us the most recent issues, but some issues that have been fixed will remain, so we need to filter them out by...

Final hurdle

Sometimes we generate issues from entries that don't make it to entities, because for example no reference was provided. in these cases, there is no way to know if this issue has been resolved in a later resource.

This is going to need some more thought, we can't simply show all these issues regardless of what endpoint they were provided on as even if they fix it in a more recent endpoint, the issue will still exist in the old endpoint. and our platform would display that. I suggest that for now with this, we only generate tasks out of these issues when they are present in the most recent endpoint.

Future suggestion (to put on the back burner): if this issue exists in an older endpoint we could highlight it as a 'potential issue/task'. this wouldn't change the status of the dataset from 'live' to 'needs fixing' but might still display somewhere on the site. (however this would require some design work)

Documenting this

Once work is about to be started we should make a log in our system design decision log