IATI / D-Portal

http://d-portal.org/
Other
31 stars 23 forks source link

Display validation status of an activity #683

Open stevieflow opened 1 month ago

stevieflow commented 1 month ago

I think this can also involve / supersede: https://github.com/IATI/D-Portal/issues/498

Right now, we have a static button that takes us from an activity page through to the validator page, for that specific activity

eg:

http://www.d-portal.org/ctrack.html?search&country=AO#view=act&aid=CH-4-2014002387 > https://validator.iatistandard.org/report/sdc_ch-161212?id=CH-4-2014002387

It would be helpful, as a next step, for d-portal to pull in and display the validation status of the activity (in this case, Critical). We can also craft some words to sit around that too

d-portal would not need to display the specific validation issues, however - as we can point people to that page

(Moving on from that, if d-portal then knows/stores the validation status, it might be feasible to then filter in/out activities of a certain status (but not right now))

@xriss I don't quite know if the Validator can feed the validation status (via an API call, for example) in this way - one for @robredpath @simon-20 and maybe @isabelbirds to clarify

The advantage of making this particular function happen is that we can more closely tie d-portal and IATI validation together --> as we progress with the visual alignment (https://github.com/IATI/D-Portal/issues/660 ) this will be more seamless...

stevieflow commented 1 month ago

Update: it could be possible that an activity could have multiple validation statuses (Critical; Warning'; Error) I think - which would make the above more of a challenge

@isabelbirds have you any examples? Thanks

xriss commented 1 month ago

maybe a json file with map of ids to whatever status you wish us to display?

BTW there will still be the possibility of "other" when an activity has no information available from the validator for whatever reason.

Also multiple published activities may use the same id ( often the same data but not always ) so this is another way we can be confused.

simon-20 commented 1 month ago

The Validator API and website takes the route of marking the file/dataset with the worst severity validation check that fails: if there are any critical failures, the whole dataset is marked Critical; if there are not criticals but some error failures, the whole dataset is marked Error.

It could make sense to follow this approach for individual activities.

This could be achieved using the existing APIs as follows (requiring one or two API calls, depending on whether the client has the dataset ID saved):

Query the Datastore to get the source dataset/file for the given Identifier--this step wouldn't be needed if the client in question (D-Portal, in this case) has kept a record of the dataset/file name or ID.

For example, using command line tools HTTPie and jq with the identifier GB-CHC-1016676-DFID-300055-109 to illustrate:

> http get "https://api.iatistandard.org/datastore/activity/select" q=="iati_identifier_exact:GB-CHC-1016676-DFID-300055-109" fl==iati_activities_document_id "Ocp-Apim-Subscription-Key: **SUBSCRIPTION_KEY_HERE**" | jq -r '.response.docs[0].iati_activities_document_id'

9842515c-51fc-485e-80d4-dd67ab924e85

Given the document id, query the Validator, and filter the JSON by Identifer, then reduce to list of severities:

> http get "https://api.iatistandard.org/validator/report" "Ocp-Apim-Subscription-Key":**SUBSCRIPTION_KEY_HERE** id==9842515c-51fc-485e-80d4-dd67ab924e85 details==false grouped==true | jq '.report.errors[] | select(.identifier=="GB-CHC-1016676-DFID-300055-109") | .errors[].errors[].severity'

"warning"
"advisory"
"advisory"
"advisory"
"advisory"
"advisory"
"advisory"
"advisory"

Then just print the worst severity. Whatever JSON / JSON Query libraries the client is using should make the equivalent of the jq commands above straightforward.

This approach lets the Datastore handle the problem of multiple identifiers in the IATI corpus. Since there shouldn't be multiple, it's undefined what happens when there are. The Datastore should return the first found copy, so it handles this case nicely.

And as @xriss says, it could just display 'Other' or 'No validation status' if there is no validation report.

Something like this could potentially be added as a new endpoint to the Validator. @robredpath, what are your thoughts on that?

xriss commented 1 month ago

This sounds like a single query per activity id?

Can I do that a million times to get the latest values?

Even if this can be grouped per dataset (which I assume it can?) it will still be thousands of hits.

I much prefer the idea of a cached static file dump of all the data with no api nonsense.

simon-20 commented 1 month ago

I was thinking one could make the call when displaying the info for the first time, so the call wouldn't be made at all if the page for a given activity were never visited, the number of API calls made by the site would be equal to the number of visits to the activity page (give or take) (so only millions of calls if you have millions of hits), and the information always up to date.

The Validator API doesn't currently have anything that would facilitate the static route, but that functionality is something that could be looked at.