IATI / IATI-Codelists-NonEmbedded

IATI Codelists that are 'non-functional' and usually provide lookup information.
http://iatistandard.org/codelists/codelist-management/
Other
3 stars 19 forks source link

January 2019 DAC codelist updates #283

Closed andylolz closed 5 years ago

andylolz commented 5 years ago

These updates are all from the DAC XML source, available here:

https://webfs.oecd.org/crs-iati-xml/Lookup/

This replaces #249.


To summarise the changes here:

Aid Type

Codes added:

Sector Category

Codes added:

Sector

Codes added:

Codes withdrawn:

Finance Type Category

Codes added:

Finance Type

Codes added:

andylolz commented 5 years ago

Related IATI discuss post: https://discuss.iatistandard.org/t/january-2019-dac-codelist-updates/1622

samuele-mattiuzzo commented 5 years ago

@andylolz for the BAs benefit, would you mind providing a diff between your import and the original XML file (if you have something like that available that is)? Do you use the script you suggested we'd use in another PR?

Thank you!

andylolz commented 5 years ago

Do you use the script you suggested we'd use in another PR?

Nope – I did this manually :) The script in #172 processes the DAC Excel file (well, it processes some CSV on datahub.io, but that comes from the DAC Excel file). @bill-anderson appears to suggest the Excel file should not be used ("more sustainable solution" etc) so this PR uses the XML instead.

would you mind providing a diff between your import and the original XML file (if you have something like that available that is)?

A diff is maybe tricky because the DAC XML file (available here) is just one big file. But I can explain the steps I went through. I did the following bits of cleanup:

I think that’s everything. Here’s what I haven’t done:

The diff in this PR shows that quite a lot of stuff has changed. I guess that’s mostly because the source has changed from Excel to XML, and there are some mismatches between the two. I think it will be difficult to verify and merge this PR for that reason. If the goal is to eventually use XML from the DAC as the source for these replicated codelists, then I’d be tempted to go back to the DAC technical team with a list of stuff to fix at their end, and use the Excel file as the source in the interim.


I’m very pleased you’re looking at this, because it’s really important that these replicated codelists are kept in sync with source. For instance, a validator might say that a dataset is invalid because a bad sector code is used, when in fact the problem might be that the IATI replicated Sector codelist is out of sync, and doesn’t include a complete list of sector codes. A publisher could also be scored down on the Aid Transparency Index for the same reason. Or an aid management system might rely on these codelists for interpreting published IATI data.

Anyway – I’d be happy to discuss next steps.

samuele-mattiuzzo commented 5 years ago

@andylolz fab, thanks! Petya and the BAs have this to check on their todo list, it'll be checked during this week!

PetyaKangalova commented 5 years ago

@andylolz thanks so much for your work on this and clarifying the steps you have undertaken. The crucial bit here is to again get confirmation from the OECD DAC that the Excel and XML include exactly the same content which at moment is not the case! We were promised that the XML will be in sync with the source file. I have copied you in the email I sent to Valerie from the DAC so that we get an answer from them and be able to proceed with the changes as soon as possible. Thanks again!

PetyaKangalova commented 5 years ago

We have now received a response from the OECD that the XML files has been updated and both Excel and XML files have been pulled from the same source.

Both xml and xl file have been regenerated (from SQL as unique source, except for Channel codes) and are available on our website http://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/dacandcrscodelists.htm.

From a quick look of the difference I have identified before the codelists are now identical in the Excel and XML files so I think we can use the updated XML to update the codelists on the IATI website.

@andylolz Is there a way of easily re-doing what you have done so far with the updated XML file? Then I can review the pull request. If it requires a lot of manual work for you, then I can look into making the comparison and adding the pull requests.

andylolz commented 5 years ago

@PetyaKangalova no problem – I’ll try and get this sorted today.

samuele-mattiuzzo commented 5 years ago

Thank you Andy!

PetyaKangalova commented 5 years ago

Thanks @andylolz ! I am off on Monday and in meetings all of Tuesday but should be able to review mid-next week! Thanks again!

andylolz commented 5 years ago

Okay – PR updated using the latest (updated) version of DAC XML. I followed the same steps described above.

PetyaKangalova commented 5 years ago

@andylolz thanks again for redoing the commit. Really appreciate it! It took me a while to review all the changes as there are quite a lot of them! See summary below:

Next steps:

  1. @andylolz , would you be able to remove the CRS Channel Code section from the pull request for now? I will contact Valerie to get confirmation but feel like this one will take some time as the XML has not been created from their source database and don’t want to hold the other changes.
  2. I will contact Valerie to get confirmation on whether sector code 74010 has been withdrawn and why description for sector categories have been removed.
  3. Once I get confirmation on 2 , I will make the necessary changes and approve the pull request. As I was reviewing the changes I kept track of all of them (whether it was code addition or change of name or description). I will then work on adding all changes to the non-embedded codelist changelog (just for the sector codelist there are more than 40 changes so might take us some time)
  4. Once codelist changes and changelog have been approved and deployed, we will add a post on IATI Discuss.
  5. We will also contact publishing tool providers to make them aware of the changes.
andylolz commented 5 years ago

It took me a while to review all the changes as there are quite a lot of them!

There are indeed! Great work reviewing!

  1. @andylolz , would you be able to remove the CRS Channel Code section from the pull request for now?

Kk, done.

  • Sector- ready to approve changes except for code 74010 and 74020

Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.

Also, the following withdrawn sector codes have disappeared:

They may have been replaced by other codes, but the idea is they’re supposed to remain in perpetuity as status="withdrawn".

andylolz commented 5 years ago

3. just for the sector codelist there are more than 40 changes so might take us some time

I’ve mentioned elsewhere that I’m in favour of scrapping this changelog. I’m unconvinced it’s worth your time. It wasn’t updated for the last DAC codelist update (see: IATI/IATI-Guidance#312) so it’s only a partial list of changes anyway.

5. We will also contact publishing tool providers to make them aware of the changes.

Okay – this is very generous of you, but again I don’t think this should be standard practice. Tool providers should be keeping an eye on discuss, or routinely pulling from source. That’s the system as documented. If they start relying on updates from you then that just becomes an extra overhead for you.

PetyaKangalova commented 5 years ago

@andylolz

Kk, done.

Thank you!

Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.

Thank you for flagging. I missed this one!

Also, the following withdrawn sector codes have disappeared:

Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.

On your point for the changelog I agree that it is a lot of effort. However, this time round there are quite a lot of new codes and it will be important to alert people which ones those are and also make sure organisations can start using them via the various publishing tools. Hence, dropping them a quick email to speed up the process, but it is indeed their responsibility of the tool providers to keep them up-to-date.

Waiting to hear from Valerie and will then action the changes!

andylolz commented 5 years ago

Excellent – all good!

Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.

Yes that’s true, but I’d expect DAC to have a better record of withdrawn codes than IATI (since IATI only started recording these relatively recently). So withdrawn codes in the XML that were not previously known to IATI are probably a good thing :)

andylolz commented 5 years ago

I’ve added a summary of changes in the PR description.

andylolz commented 5 years ago

Seems like the conclusion is: Merge this, and then add a note to the FinanceType codelist (about non-flow items) in a new pull request.

@PetyaKangalova is that right?

andylolz commented 5 years ago

:star: :star2: :star: