codeforIATI / iati-ideas

💡 Ideas for new codeforIATI projects and blogs
https://ideas.codeforiati.org
0 stars 0 forks source link

[PROJECT IDEA] IATI activity-identifier check service (via Datastore classic?) #37

Closed stevieflow closed 3 years ago

stevieflow commented 3 years ago

Rationale

For many organisations trying to implement some form of traceability, being able to validate the activity iati-identifier they cite in various attributes, exists would be a valuable service.

IATI documentation even includes:

A valid activity identifier (as defined in iati-activity/iati-identifier).

Currently, there is no viable way for someone to check if the identifier they include in their data is "valid"

Proposal

Two fold:

1 - a "list"/service of current IATI activity identifiers.
Whilst this cannot possibly be a "codelist" (of over a million codes) to navigate, it'd be interesting to know if the lists of current activity identifiers (pulled into Datastore Classic) could be queried by a user (to return a true/false). Is that function already in place?

2 - Simple validation of an IATI XML file If I pass an IATI XML file, there are four distinct places where I can cite an external IATI activity identifier:

A validation service would check whether any of these fields are populated, and then -- using the service in [1] as to whether they currently exist

The output to the user could then be whether any false matches were found (and even the positive matches)

stevieflow commented 3 years ago

Pinging @HermanvanLoon on this, as I know he is interested in such a service

matmaxgeds commented 3 years ago

Does this: https://datastore.codeforiati.org/api/1/access/activity.xml?iati-identifier=1022405|1022474 pickup the two identifiers in all the various places or just the activity-ids? Sorry, could probably check this but quicker to ask!

If not, might iatikit be an option? https://iatikit.readthedocs.io/en/stable/examples.html#find-an-activity-by-its-identifier also not sure if it searchs all 4 places?

stevieflow commented 3 years ago

Thanks @matmaxgeds

I think I was thinking along simpler lines!

A - if we pass a string to the Datastore (say 1022474) - can we receive a response to understand if that matches any of the iati-identifier in the store (true/false)? I think that's possible...

B (which probably comes before A in any workflow!) - any validation service would look for those strings in any of the following attributes in an activity:

HermanvanLoon commented 3 years ago

Hi Steven, Agree with you, provided the datastore has recent data and its performance is good enough.

Regards, Herman

matmaxgeds commented 3 years ago

Hey - so trying to use the datastore with the activity ID 'XI-IATI-WBTF-P147521' as that project has a related activity included to test with.....I think this query should look in all four of the places: https://datastore.codeforiati.org/api/1/access/activity.xml?iati-identifier=XI-IATI-WBTF-P147521|transaction_provider-org.provider-activity-id=XI-IATI-WBTF-P147521|transaction_receiver-org.receiver-activity-id=XI-IATI-WBTF-P147521|related-activity=XI-IATI-WBTF-P147521 - but when I run it, it only returns the activity matching the activity_id field, whereas https://datastore.codeforiati.org/api/1/access/activity.xml?related-activity=XI-IATI-WBTF-P147521 matches the other activity that should be picked up.......I guessed at the use of '|' for 'or', maybe that is not valid?

I also don't know how to get the datastore to return a yes/no result, although returning any conflicting activities might be useful in case there is a conflict found.

markbrough commented 3 years ago

@HermanvanLoon Datastore Classic is updated every night now - last update was 5 hours ago. See here: https://datastore.codeforiati.org/ @stevieflow we can think about how we could add a new endpoint to just return a list of matching IATI activity IDs rather than the full XML

notshi commented 3 years ago

You can currently do this through the dQuery interface, for example 1022474 gets you two results. You can click on Browse Activities to view these activities on d-portal.

It takes a while to load because it uses an expensive wildcard search.

Something similar was raised here as well https://github.com/devinit/D-Portal/issues/510. In this case, the wildcard % is only placed after the search term to get results that begin with the search term.

andylolz commented 3 years ago

I’ve had a go at a purpose-built thing here: https://activity-id-checker.codeforiati.org

It currently looks very ugly… It’s pretty fast, though, because it just does this one thing. The big list of IDs is auto-updated daily.

notshi commented 3 years ago

That's superfast! It's also really cool to see the identifiers as you type. Very handy when checking questionable looking identifiers!

If it's helpful, you can also use this url to get d-portal pages (much faster than the dquery option): http://d-portal.org/ctrack.html?aid_like=%251022474%25#view=main

%25id looks for activities that end with this id. id%25 looks for activities that start with this id. %25id%25 looks for activities that has this id within it.

andylolz commented 3 years ago

If it's helpful, you can also use this url to get d-portal pages (much faster than the dquery option): http://d-portal.org/ctrack.html?aid_like=%251022474%25#view=main

Oh wow – that’s great! Yes, that’s super fast!

andylolz commented 3 years ago

Very handy when checking questionable looking identifiers

Yeah, I was wondering if there’s value in using this to flag good/bad identifiers. E.g.:

This identifier is good because it:

  • uses publisher ID as a prefix
  • doesn’t contain any special characters
  • uses an org-id prefix?
  • doesn’t use lowercase characters?

That sort of thing.

andylolz commented 3 years ago

I think we can reasonably mark this as “done”!

stevieflow commented 3 years ago

Agreed / huge thanks all (esp @andylolz !)

Two follow-ups:

andylolz commented 3 years ago

This should be listed on https://codeforiati.org/ ?

Kk, sorted now!

I'll create a new ticket about an API on/for this -- which would need thoughts around batch checking etc

So Activity ID checker currently offers no API! (There sort of is one, but it is baffling). The site is just a glitzy frontend.

I can probably add an API… But if someone wants to batch check identifiers, they’re much better off using d-portal or DSClassic.

In d-portal, it would be this sort of thing: https://d-portal.org/q.json?aid=GB-CHC-800672-GTF309

In DSClassic, this sort of thing: https://datastore.codeforiati.org/api/1/access/activity.json?iati-identifier=GB-CHC-800672-GTF309