Open BobHarper1 opened 6 years ago
either an incorrect IATI identifier could be fixed
Am I right in thinking it’s only incorrect if the IATI org file is v2.0x?
I think it is in 1.05 but as iati-identifer
(changed to organisation-identifier
in 2.01)? http://iatistandard.org/105/organisation-standard/iati-organisations/iati-organisation/iati-identifier/
http://iatistandard.org/105/organisation-identifiers/
Sorry, what I mean is:
If you compare the description of the @ref
attribute at v1.05 with the description at v2.01, you’ll see that at v2.01, there is a “MUST”, with a description of (although unfortunately no direct mention of) the organisation identifier format used by http://org-id.guide:
Machine-readable identification string for the organisation issuing the report. Must be in the format {RegistrationAgency}-{RegistrationNumber} where {RegistrationAgency} is a valid code in the RegistrationAgency code list and {RegistrationNumber } is a valid identifier issued by the {RegistrationAgency}
I think it is in 1.05 but as iati-identifer (changed to organisation-identifier in 2.01)?
Sorry, yes – the same applies for those two. organisation-identifier
includes the “MUST” text, iati-identifier
doesn’t.
So for instance, this is not an incorrect organisation identifier: https://andylolz.github.io/org-id-finder/#46004
…because the file it’s declared in is v1.0x.
I think that’s right? Hmm – It seems like it must be wrong…!
Seems like, but yes I think you are right!
Ok, so my proposal
where having direct links to the registering agency's entry would assist traceability.
It would also help to be able to identify where a namespace does not match an existing list code,so that either an incorrect IATI identifier could be fixed,or a missing list added to org-id
Could that work by first determining the version that the identifier was published (adding a field for /iati-organisations/@version
to the scraping process)?
So my initial reaction was “ooh, this is cool!” but I thought about it for a bit and had a few issues.
I don’t mean to shut this issue down, and I’m still interested… I just have some reservations, so I thought I’d note them down.
A key element of traceability (I would argue) is consistent approaches to identifier creation, so I think understanding the provenance of an identifier is important here. I can search a name (e.g. 'Hivos'), find its IATI identifier through org-id-finder, and know that 41198677
that should match an organisation on the NL-KVK, ergo, it is that organisation, and no other.
Plus, since the same process of identifier creation is extensible to other standards, then that helps too?
Re 3. Ah, I meant link to the relevant list's entry on the org-id site, rather than the agency's website (the purpose being that you have the information to find the right page ultimately, even if the follow-through link turns out to be dead... but I can fix that one now!).
consistent approaches to identifier creation
If IATI publishers followed a consistent (i.e. reproducible) approach to identifier creation, I’m not sure this project would need to exist! Users would be able to figure out org IDs directly, by following the org-id.guide guidelines to reproduce.
This project instead shows the self-declared organisation identifiers – the provenance for which (the publisher’s IATI organisation file) is linked in the source dropdown. In IATI-land, it seems these identifiers sometimes fail to follow the consistent approach outlined by org-id.guide.
But can I ask… Which list would we validate against? The one on org-id.guide, or the codelist in the standard? The former, right? (The latter is legacy I think?)
Aha – I think I finally understand!
@timgdavies suggests that even if a publisher self-declares an org ID, if it doesn’t conform to the org-id.guide format, an org-id.guide identifier should be (generated and) preferred.
If that’s the case, I’m happy to do these checks, and provide the recommended, consistent identifier (probably with some accompanying explanation).
But can I ask… Which list would we validate against? The one on org-id.guide, or the codelist in the standard? The former, right? (The latter is legacy I think?)
The org-id.guide one. I thought that the list on that page was supposed to be being kept in sync with org-id.guide's XML output (which mirrors it's structure), but it seems that is not happening.
@BobHarper1 @timgdavies I’ve added the alternative, recommended org ID to this gist: https://gist.github.com/andylolz/d16c35f190e2f3e8f4112cfa6728a8f3
Let me know if those look right / if I’ve missed any. It doesn’t find one for e.g. US-18
because it has to do a country name lookup to figure out the correct DAC donor code, and that bit goes wrong for countries where the DAC uses an abbreviated name (e.g. “United States” instead of “United States of America”).
If we’re happy with the recommendation algorithm, I’ll add it on the frontend. Btw the algorithm it uses is here.
To summarise the algorithm: Where value
is a given org ID in an IATI organisation file…
value
uses the org-id.guide format ([^-]+-[^-]+-.+
) and the prefix is on the list of lists, return itvalue
uses the org-id.guide format and the prefix (when uppercased) is on the list of lists, return it with the prefix uppercasedvalue
looks like a DAC channel code (\d{5}
), and is on the DAC channel code list, add the prefix XM-DAC-
and return itvalue
uses a format like AU-5
([A-Z]{2}-\d+
):
XM-DAC-{DAC donor code}-{agency code}
(in the case of AU-5
, this would be XM-DAC-801-5
)value
is on IATIOrganisationIdentifier but is missing its XI-IATI-
prefix, add the prefixNone
)in the case of
AU-5
, this would beXM-DAC-801-5
^^ This step appears to be wrong… I’m not sure how it should work. Any clues?
My understanding of AU-5 - Australian Agency for International Development
is that the answer is on the OECD DAC Agency sheet
AU = 801 Australian Agency for International Development (from IATI list) is not actually named on the current EOCD DAC agency list)
Perhaps another way around this is just to convert the current OECD-DAC agency list with the XM-DAC prefix, and use that as the souce (in other words, forget the original, yet outdated, IATI list)?
This step appears to be wrong… I’m not sure how it should work. Any clues?
I dont see the step as being wrong. Just that this agency is now not on the list
One other factor - DFID would be XM-DAC-12-1
from this list, but we know DFID are GB-GOV-1
(from their own reporting-org and org file )
I dont see the step as being wrong. Just that this agency is now not on the list
Sorry – I should have given more details. I concluded it must be wrong because Netherlands use XM-DAC-7
rather than XM-DAC-7-1
. I’m not sure if that’s a mistake on my part, or theirs.
One other factor - DFID would be XM-DAC-12-1 from this list, but we know DFID are GB-GOV-1 (from their own reporting-org and org file )
Yeah – If they’re using something that looks like a valid org ID (e.g. GB-GOV-1
) then the algorithm returns that. I.e. it won’t keep trying to find a better solution. See line 91 here – valid_org_id
is True, and there’s no suggested_org_id
.
(in other words, forget the original, yet outdated, IATI list)?
Yeah exactly – the algorithm proposed above doesn’t touch the OrganisationIdentifier list at all. It uses the DAC codelists directly (although it uses the donor list rather than the agency list… Perhaps that’s wrong.) It should probably also use the XML that the DAC now provide, but at the moment it uses this datahub dataset.
Sorry – I should have given more details. I concluded it must be wrong because Netherlands use XM-DAC-7 rather than XM-DAC-7-1. I’m not sure if that’s a mistake on my part, or theirs.
I can ask - I think it should be XM-DAC-7-1
I can ask - I think it should be XM-DAC-7-1
Thanks!
There’s also Switzerland SDC (XM-DAC-CH-4) and Gates (XM-DAC-DAC-1601). There are probably more examples to boot!
@BobHarper1 (but cc @stevieflow @timgdavies) it would be great if you could have a scan through https://gist.github.com/andylolz/d16c35f190e2f3e8f4112cfa6728a8f3 and check:
suggested_org_id
column looks correct, andsuggested_org_id
columnOnce that’s approved, I can figure out how to report this on the front end, and then close up this issue. Thanks!
Thanks @andylolz I can see
These seem all good for suggestions. Pinging @markbrough (re: https://discuss.iatistandard.org/t/why-does-2-02-include-a-code-list-that-was-not-supported-since-1-04/1101/9)
One example that might not be so useful, but fits the logic is : IADB --> XI-IATI-IADB
Super useful – thanks @stevieflow!
These seem all good for suggestions.
Yeah, totally agree with the emphasis here. So I’m thinking I will still present the self-declared identifier in the same way, but just add something like a tooltip or an extra note somehow that says:
Psst… while this is the self-declared identifier… it doesn’t actually conform to the new methodology. Here’s the one that does. Maybe you could give them a nudge and let them know? Kk thx.
…or thereabouts. What do you think?
One example that might not be so useful, but fits the logic is : IADB --> XI-IATI-IADB
Ooh, interesting! Pretend I know nothing (because I actually know nothing). Why wouldn’t that one be useful? (Perhaps I shouldn’t be generating XI-IATI
identifiers? I could take that bit out if it’s not a good idea.)
Re: XI-IATI
- I'd agree to take these suggestions out, as this is the last resort.
But, I now realise that you might not be doing what I thought !
I thought you were suggesting XI-IATI-IADB
because you'd found IADB
. When we look at the OECD DAC list, there's a listing, but they seem to share that with others:
46012 | 2016 | IDB | Inter-American Development Bank, Inter-American Investment Corporation and Multilateral Investment Fund
Therefore, instead of XM-DAC-46102
I think there is a reason why it's XI-IATI-IADB
(but the specific reason is actually missing from the changelog)
Still there? I think you were suggesting XI-IATI-IADB
because it's in the registry but not in the org file, rather than suggesting it as a last resort
Yeah, any any form of psst text is welcome :)
When we look at the OECD DAC list, there's a listing, but they seem to share that with others
Aha! The “they seem to share that with others” bit sounds like a plausible explanation for why IATI might have decided to invent a new identifier.
I think you were suggesting XI-IATI-IADB because it's in the registry but not in the org file
That’s a good answer, but it’s not the right answer! I don’t look at the org IDs in the registry metadata at all (mostly because it’s more often than not wrong). Instead, this comes from this line of the algorithm:
- If
value
is on IATIOrganisationIdentifier but is missing itsXI-IATI-
prefix, add the prefix
IADB
is found on this list, so the XI-IATI-
prefix is added. This is maybe a biiiit of a dodgy approach (since it could easily result in false positives) – happy to remove it.
Aha! The “they seem to share that with others” bit sounds like a plausible explanation for why IATI might have decided to invent a new identifier.
Perhaps, although we don't really know (via https://github.com/IATI/IATI-Guidance/issues/308#issuecomment-369368195)
IADB is found on this list, so the XI-IATI- prefix is added.
OK - I think that seems OK. There's no instance of "if not on any list, then suggest XI-IATI
", is there?
There's no instance of "if not on any list, then suggest
XI-IATI
", is there?
No way!
The namespace code part of organisation identifiers should be in the org-id register of lists.
This helps in answering the question:
where having direct links to the registering agency's entry would assist traceability.
It would also help to be able to identify where a namespace does not match an existing list code,so that either an incorrect IATI identifier could be fixed,or a missing list added to org-id