codeforIATI / org-id-finder

🔍 Search organisation names to find their organisation identifiers (and vice versa)
https://org-id-finder.codeforiati.org
MIT License
1 stars 1 forks source link

Record and display whether IATI identifier has a matching prefix in org-id lists #6

Open BobHarper1 opened 6 years ago

BobHarper1 commented 6 years ago

The namespace code part of organisation identifiers should be in the org-id register of lists.

This helps in answering the question:

which organisation does NL-KVK-41198677 refer to?

where having direct links to the registering agency's entry would assist traceability.

It would also help to be able to identify where a namespace does not match an existing list code,so that either an incorrect IATI identifier could be fixed,or a missing list added to org-id

andylolz commented 6 years ago

either an incorrect IATI identifier could be fixed

Am I right in thinking it’s only incorrect if the IATI org file is v2.0x?

BobHarper1 commented 6 years ago

I think it is in 1.05 but as iati-identifer (changed to organisation-identifier in 2.01)? http://iatistandard.org/105/organisation-standard/iati-organisations/iati-organisation/iati-identifier/ http://iatistandard.org/105/organisation-identifiers/

andylolz commented 6 years ago

Sorry, what I mean is:

If you compare the description of the @ref attribute at v1.05 with the description at v2.01, you’ll see that at v2.01, there is a “MUST”, with a description of (although unfortunately no direct mention of) the organisation identifier format used by http://org-id.guide:

Machine-readable identification string for the organisation issuing the report. Must be in the format {RegistrationAgency}-{RegistrationNumber} where {RegistrationAgency} is a valid code in the RegistrationAgency code list and {RegistrationNumber } is a valid identifier issued by the {RegistrationAgency}

I think it is in 1.05 but as iati-identifer (changed to organisation-identifier in 2.01)?

Sorry, yes – the same applies for those two. organisation-identifier includes the “MUST” text, iati-identifier doesn’t.

So for instance, this is not an incorrect organisation identifier: https://andylolz.github.io/org-id-finder/#46004

…because the file it’s declared in is v1.0x.

I think that’s right? Hmm – It seems like it must be wrong…!

BobHarper1 commented 6 years ago

Seems like, but yes I think you are right!

Ok, so my proposal

where having direct links to the registering agency's entry would assist traceability.

It would also help to be able to identify where a namespace does not match an existing list code,so that either an incorrect IATI identifier could be fixed,or a missing list added to org-id

Could that work by first determining the version that the identifier was published (adding a field for /iati-organisations/@version to the scraping process)?

andylolz commented 6 years ago

So my initial reaction was “ooh, this is cool!” but I thought about it for a bit and had a few issues.

  1. This is a good idea, but it should be something the IATI registry refresher does. That’s because it seems to me like it’s between the registry and the publisher to resolve. People using these org IDs (who this tool is for) mostly shouldn’t need to care.
  2. I don’t see how this assists traceability, but perhaps that’s just me being dense. Not in the sense that I mean traceability, anyway, as described in the footnote in the README.
  3. “direct links to the registering agency's entry” does sound cool… But I don’t think org-id.guide gives me that (unless I’ve misunderstood, and we mean different things?) In the case of the example above, for instance, this is probably the right link. But org-id.guide only gets me to here.

I don’t mean to shut this issue down, and I’m still interested… I just have some reservations, so I thought I’d note them down.

BobHarper1 commented 6 years ago

A key element of traceability (I would argue) is consistent approaches to identifier creation, so I think understanding the provenance of an identifier is important here. I can search a name (e.g. 'Hivos'), find its IATI identifier through org-id-finder, and know that 41198677 that should match an organisation on the NL-KVK, ergo, it is that organisation, and no other.

Plus, since the same process of identifier creation is extensible to other standards, then that helps too?

Re 3. Ah, I meant link to the relevant list's entry on the org-id site, rather than the agency's website (the purpose being that you have the information to find the right page ultimately, even if the follow-through link turns out to be dead... but I can fix that one now!).

andylolz commented 6 years ago

consistent approaches to identifier creation

If IATI publishers followed a consistent (i.e. reproducible) approach to identifier creation, I’m not sure this project would need to exist! Users would be able to figure out org IDs directly, by following the org-id.guide guidelines to reproduce.

This project instead shows the self-declared organisation identifiers – the provenance for which (the publisher’s IATI organisation file) is linked in the source dropdown. In IATI-land, it seems these identifiers sometimes fail to follow the consistent approach outlined by org-id.guide.


But can I ask… Which list would we validate against? The one on org-id.guide, or the codelist in the standard? The former, right? (The latter is legacy I think?)

andylolz commented 6 years ago

Aha – I think I finally understand!

@timgdavies suggests that even if a publisher self-declares an org ID, if it doesn’t conform to the org-id.guide format, an org-id.guide identifier should be (generated and) preferred.

If that’s the case, I’m happy to do these checks, and provide the recommended, consistent identifier (probably with some accompanying explanation).

timgdavies commented 6 years ago

But can I ask… Which list would we validate against? The one on org-id.guide, or the codelist in the standard? The former, right? (The latter is legacy I think?)

The org-id.guide one. I thought that the list on that page was supposed to be being kept in sync with org-id.guide's XML output (which mirrors it's structure), but it seems that is not happening.

andylolz commented 6 years ago

@BobHarper1 @timgdavies I’ve added the alternative, recommended org ID to this gist: https://gist.github.com/andylolz/d16c35f190e2f3e8f4112cfa6728a8f3

Let me know if those look right / if I’ve missed any. It doesn’t find one for e.g. US-18 because it has to do a country name lookup to figure out the correct DAC donor code, and that bit goes wrong for countries where the DAC uses an abbreviated name (e.g. “United States” instead of “United States of America”).

If we’re happy with the recommendation algorithm, I’ll add it on the frontend. Btw the algorithm it uses is here.

To summarise the algorithm: Where value is a given org ID in an IATI organisation file…

andylolz commented 6 years ago

in the case of AU-5, this would be XM-DAC-801-5

^^ This step appears to be wrong… I’m not sure how it should work. Any clues?

stevieflow commented 6 years ago

My understanding of AU-5 - Australian Agency for International Development is that the answer is on the OECD DAC Agency sheet

AU = 801 Australian Agency for International Development (from IATI list) is not actually named on the current EOCD DAC agency list)

Perhaps another way around this is just to convert the current OECD-DAC agency list with the XM-DAC prefix, and use that as the souce (in other words, forget the original, yet outdated, IATI list)?

stevieflow commented 6 years ago

This step appears to be wrong… I’m not sure how it should work. Any clues?

I dont see the step as being wrong. Just that this agency is now not on the list

One other factor - DFID would be XM-DAC-12-1 from this list, but we know DFID are GB-GOV-1 (from their own reporting-org and org file )

andylolz commented 6 years ago

I dont see the step as being wrong. Just that this agency is now not on the list

Sorry – I should have given more details. I concluded it must be wrong because Netherlands use XM-DAC-7 rather than XM-DAC-7-1. I’m not sure if that’s a mistake on my part, or theirs.

One other factor - DFID would be XM-DAC-12-1 from this list, but we know DFID are GB-GOV-1 (from their own reporting-org and org file )

Yeah – If they’re using something that looks like a valid org ID (e.g. GB-GOV-1) then the algorithm returns that. I.e. it won’t keep trying to find a better solution. See line 91 here – valid_org_id is True, and there’s no suggested_org_id.

(in other words, forget the original, yet outdated, IATI list)?

Yeah exactly – the algorithm proposed above doesn’t touch the OrganisationIdentifier list at all. It uses the DAC codelists directly (although it uses the donor list rather than the agency list… Perhaps that’s wrong.) It should probably also use the XML that the DAC now provide, but at the moment it uses this datahub dataset.

stevieflow commented 6 years ago

Sorry – I should have given more details. I concluded it must be wrong because Netherlands use XM-DAC-7 rather than XM-DAC-7-1. I’m not sure if that’s a mistake on my part, or theirs.

I can ask - I think it should be XM-DAC-7-1

andylolz commented 6 years ago

I can ask - I think it should be XM-DAC-7-1

Thanks!

There’s also Switzerland SDC (XM-DAC-CH-4) and Gates (XM-DAC-DAC-1601). There are probably more examples to boot!

andylolz commented 6 years ago

@BobHarper1 (but cc @stevieflow @timgdavies) it would be great if you could have a scan through https://gist.github.com/andylolz/d16c35f190e2f3e8f4112cfa6728a8f3 and check:

  1. the stuff that’s in the suggested_org_id column looks correct, and
  2. whether there’s anything missing from the suggested_org_id column

Once that’s approved, I can figure out how to report this on the front end, and then close up this issue. Thanks!

stevieflow commented 6 years ago

Thanks @andylolz I can see

These seem all good for suggestions. Pinging @markbrough (re: https://discuss.iatistandard.org/t/why-does-2-02-include-a-code-list-that-was-not-supported-since-1-04/1101/9)

One example that might not be so useful, but fits the logic is : IADB --> XI-IATI-IADB

andylolz commented 6 years ago

Super useful – thanks @stevieflow!

These seem all good for suggestions.

Yeah, totally agree with the emphasis here. So I’m thinking I will still present the self-declared identifier in the same way, but just add something like a tooltip or an extra note somehow that says:

Psst… while this is the self-declared identifier… it doesn’t actually conform to the new methodology. Here’s the one that does. Maybe you could give them a nudge and let them know? Kk thx.

…or thereabouts. What do you think?


One example that might not be so useful, but fits the logic is : IADB --> XI-IATI-IADB

Ooh, interesting! Pretend I know nothing (because I actually know nothing). Why wouldn’t that one be useful? (Perhaps I shouldn’t be generating XI-IATI identifiers? I could take that bit out if it’s not a good idea.)

stevieflow commented 6 years ago

Re: XI-IATI - I'd agree to take these suggestions out, as this is the last resort.

But, I now realise that you might not be doing what I thought !

I thought you were suggesting XI-IATI-IADB because you'd found IADB. When we look at the OECD DAC list, there's a listing, but they seem to share that with others:

46012 | 2016 | IDB | Inter-American Development Bank, Inter-American Investment Corporation and Multilateral Investment Fund

Therefore, instead of XM-DAC-46102 I think there is a reason why it's XI-IATI-IADB (but the specific reason is actually missing from the changelog)

Still there? I think you were suggesting XI-IATI-IADB because it's in the registry but not in the org file, rather than suggesting it as a last resort

Yeah, any any form of psst text is welcome :)

andylolz commented 6 years ago

When we look at the OECD DAC list, there's a listing, but they seem to share that with others

Aha! The “they seem to share that with others” bit sounds like a plausible explanation for why IATI might have decided to invent a new identifier.

I think you were suggesting XI-IATI-IADB because it's in the registry but not in the org file

That’s a good answer, but it’s not the right answer! I don’t look at the org IDs in the registry metadata at all (mostly because it’s more often than not wrong). Instead, this comes from this line of the algorithm:

IADB is found on this list, so the XI-IATI- prefix is added. This is maybe a biiiit of a dodgy approach (since it could easily result in false positives) – happy to remove it.

stevieflow commented 6 years ago

Aha! The “they seem to share that with others” bit sounds like a plausible explanation for why IATI might have decided to invent a new identifier.

Perhaps, although we don't really know (via https://github.com/IATI/IATI-Guidance/issues/308#issuecomment-369368195)

IADB is found on this list, so the XI-IATI- prefix is added.

OK - I think that seems OK. There's no instance of "if not on any list, then suggest XI-IATI", is there?

andylolz commented 6 years ago

There's no instance of "if not on any list, then suggest XI-IATI", is there?

No way!