Closed andylolz closed 5 years ago
Related IATI discuss post: https://discuss.iatistandard.org/t/january-2019-dac-codelist-updates/1622
@andylolz for the BAs benefit, would you mind providing a diff between your import and the original XML file (if you have something like that available that is)? Do you use the script you suggested we'd use in another PR?
Thank you!
Do you use the script you suggested we'd use in another PR?
Nope – I did this manually :) The script in #172 processes the DAC Excel file (well, it processes some CSV on datahub.io, but that comes from the DAC Excel file). @bill-anderson appears to suggest the Excel file should not be used ("more sustainable solution" etc) so this PR uses the XML instead.
would you mind providing a diff between your import and the original XML file (if you have something like that available that is)?
A diff is maybe tricky because the DAC XML file (available here) is just one big file. But I can explain the steps I went through. I did the following bits of cleanup:
@status
es with either "active" or "withdrawn" (The @status
attribute in the DAC XML sometimes contains "active-MCD", "active- Pilot" or "Vonlontary basis" [sic], which are not valid statuses. I’ve flagged this issue with your colleagues and with Valerie by email, but for the purposes of this PR I’ve fixed these manually.)I think that’s everything. Here’s what I haven’t done:
status="withdrawn"
)The diff in this PR shows that quite a lot of stuff has changed. I guess that’s mostly because the source has changed from Excel to XML, and there are some mismatches between the two. I think it will be difficult to verify and merge this PR for that reason. If the goal is to eventually use XML from the DAC as the source for these replicated codelists, then I’d be tempted to go back to the DAC technical team with a list of stuff to fix at their end, and use the Excel file as the source in the interim.
I’m very pleased you’re looking at this, because it’s really important that these replicated codelists are kept in sync with source. For instance, a validator might say that a dataset is invalid because a bad sector code is used, when in fact the problem might be that the IATI replicated Sector codelist is out of sync, and doesn’t include a complete list of sector codes. A publisher could also be scored down on the Aid Transparency Index for the same reason. Or an aid management system might rely on these codelists for interpreting published IATI data.
Anyway – I’d be happy to discuss next steps.
@andylolz fab, thanks! Petya and the BAs have this to check on their todo list, it'll be checked during this week!
@andylolz thanks so much for your work on this and clarifying the steps you have undertaken. The crucial bit here is to again get confirmation from the OECD DAC that the Excel and XML include exactly the same content which at moment is not the case! We were promised that the XML will be in sync with the source file. I have copied you in the email I sent to Valerie from the DAC so that we get an answer from them and be able to proceed with the changes as soon as possible. Thanks again!
We have now received a response from the OECD that the XML files has been updated and both Excel and XML files have been pulled from the same source.
Both xml and xl file have been regenerated (from SQL as unique source, except for Channel codes) and are available on our website http://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/dacandcrscodelists.htm.
From a quick look of the difference I have identified before the codelists are now identical in the Excel and XML files so I think we can use the updated XML to update the codelists on the IATI website.
@andylolz Is there a way of easily re-doing what you have done so far with the updated XML file? Then I can review the pull request. If it requires a lot of manual work for you, then I can look into making the comparison and adding the pull requests.
@PetyaKangalova no problem – I’ll try and get this sorted today.
Thank you Andy!
Thanks @andylolz ! I am off on Monday and in meetings all of Tuesday but should be able to review mid-next week! Thanks again!
Okay – PR updated using the latest (updated) version of DAC XML. I followed the same steps described above.
@andylolz thanks again for redoing the commit. Really appreciate it! It took me a while to review all the changes as there are quite a lot of them! See summary below:
Next steps:
It took me a while to review all the changes as there are quite a lot of them!
There are indeed! Great work reviewing!
- @andylolz , would you be able to remove the CRS Channel Code section from the pull request for now?
Kk, done.
- Sector- ready to approve changes except for code 74010 and 74020
Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.
Also, the following withdrawn sector codes have disappeared:
They may have been replaced by other codes, but the idea is they’re supposed to remain in perpetuity as status="withdrawn"
.
3. just for the sector codelist there are more than 40 changes so might take us some time
I’ve mentioned elsewhere that I’m in favour of scrapping this changelog. I’m unconvinced it’s worth your time. It wasn’t updated for the last DAC codelist update (see: IATI/IATI-Guidance#312) so it’s only a partial list of changes anyway.
5. We will also contact publishing tool providers to make them aware of the changes.
Okay – this is very generous of you, but again I don’t think this should be standard practice. Tool providers should be keeping an eye on discuss, or routinely pulling from source. That’s the system as documented. If they start relying on updates from you then that just becomes an extra overhead for you.
@andylolz
Kk, done.
Thank you!
Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.
Thank you for flagging. I missed this one!
Also, the following withdrawn sector codes have disappeared:
Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.
On your point for the changelog I agree that it is a lot of effort. However, this time round there are quite a lot of new codes and it will be important to alert people which ones those are and also make sure organisations can start using them via the various publishing tools. Hence, dropping them a quick email to speed up the process, but it is indeed their responsibility of the tool providers to keep them up-to-date.
Waiting to hear from Valerie and will then action the changes!
Excellent – all good!
Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.
Yes that’s true, but I’d expect DAC to have a better record of withdrawn codes than IATI (since IATI only started recording these relatively recently). So withdrawn codes in the XML that were not previously known to IATI are probably a good thing :)
I’ve added a summary of changes in the PR description.
Seems like the conclusion is: Merge this, and then add a note to the FinanceType codelist (about non-flow items) in a new pull request.
@PetyaKangalova is that right?
:star: :star2: :star:
These updates are all from the DAC XML source, available here:
https://webfs.oecd.org/crs-iati-xml/Lookup/
This replaces #249.
To summarise the changes here:
Aid Type
Codes added:
H03
- Asylum-seekers ultimately acceptedH04
- Asylum-seekers ultimately rejectedH05
- Recognised refugeesSector Category
Codes added:
123
- Non-communicable diseases (NCDs)Sector
Codes added:
11250
- School feeding12310
- NCDs control, general12320
- Tobacco use control12330
- Control of harmful use of alcohol and drugs12340
- Promotion of mental health and well-being12350
- Other prevention and treatment of NCDs12382
- Research for prevention and control of NCDs15190
- Facilitation of orderly, safe, regular and responsible migration and mobility16070
- Labour Rights16080
- Social Dialogue24050
- Remittance facilitation, promotion and optimisation25030
- Business development services25040
- Responsible Business Conduct43060
- Disaster Risk Reduction43071
- Food security policy and administrative management43072
- Household food security programmes43073
- Food safety and quality74020
- Multi-hazard response preparedness93011
- Refugees/asylum seekers in donor countries - food and shelter93012
- Refugees/asylum seekers in donor countries - training93013
- Refugees/asylum seekers in donor countries - health93014
- Refugees/asylum seekers in donor countries - other temporary sustenance93015
- Refugees/asylum seekers in donor countries - voluntary repatriation93016
- Refugees/asylum seekers in donor countries - transport93017
- Refugees/asylum seekers in donor countries - rescue at sea93018
- Refugees/asylum seekers in donor countries - administrative costsCodes withdrawn:
41050
- Flood prevention/control74010
- Disaster prevention and preparednessFinance Type Category
Codes added:
0
- NON FLOW ITEMSFinance Type
Codes added:
1
- GNI: Gross National Income2
- ODA % GNI3
- Total Flows % GNI4
- Population