Open andylolz opened 3 years ago
I contacted them today.
Also raised here https://github.com/zimmerman-team/iati.cloud/issues/2470
Thanks, @notshi! I’ve added a comment there.
This issue has been automatically marked as "awaiting update". If you’ve checked and the issue is still applicable, please add a message to that effect.
It’s definitely still applicable.
Hello! There has been no activity on this issue in the last 30 days. I wonder if it has now been resolved?!
If you’re reading this, would you mind checking to see if the issue is still applicable?
Thank you!
would you mind checking to see if the issue is still applicable?
It is, yes. I suspect some services have been whitelisted on Australia DFAT servers, but others have not.
The issue was raised on the registry (https://github.com/IATI/ckanext-iati/issues/323), but after changing the user-agent
header (https://github.com/IATI/ckanext-iati/commit/d195e2deed4d6ef77487b9ceb7fa0b2b75bbce49), it looks like the registry archiver is successfully fetching these datasets.
Also, the new datastore now appears to successfully fetch these datasets: https://iatidatastore.iatistandard.org/search/activity?q=reporting_org_ref:(AU-5)&wt=json&rows=50
{
…
"response": {
"numFound": 13632,
"start": 0,
"docs": […]
}
}
However, d-portal is still having trouble downloading Australia DFAT datasets: http://d-portal.org/ctrack.html#view=dash_sluglog&slug=ausgov-af
http://d-portal.org/ctrack.html#view=search&reporting_ref=AU-5
The old datastore no longer exists, but datastore classic has never managed to fetch Australia DFAT data. E.g.: https://datastore.codeforiati.org/api/1/about/dataset/ausgov-af
{
"dataset": "ausgov-af",
"last_modified": "2021-03-31T23:15:12.448867",
"num_resources": 1,
"resources": [
{
"last_fetch": "2021-03-02T00:40:31",
"last_parsed": null,
"last_status_code": 404,
"last_successful_fetch": null,
"num_of_activities": 0,
"url": "https://www.dfat.gov.au/sites/default/files/Australian_Aid_Country_File_Afghanistan.xml"
}
]
}
(NB the last_status_code
is wrong here – that’s a known bug.)
Thanks for the update, @andylolz
d-portal downloads the files randomly and in no particular order so assuming we are allowed to at least get one file a day, we might eventually get all of them. However, as you've mentioned, we are probably not whitelisted.
Via our database logs, the last last successful fetch was 15 Dec 2020 for 13,650 activities.
This was the same day the d-portal server died due to a disk error which took a week to be replaced and installed, and our ip address would have also changed as a result. When the hard disk on the server failed, we lost any data that was currently in a cached state. This would explain why there is no 'last successfully downloaded' date for AU-5 datasets on Dash.
The Dashboard is also having trouble downloading the files. http://dashboard.iatistandard.org/publisher/ausgov.html
Looks like it's been happening since 2020-07-13 19:36:57 +0100.
Oddly, there was a blip on 2021-04-21 14:52:12 +0100 where the Dashboard was able to retrieve 13,927 activities!
I am not sure it feels 'right', but where the DSv2 has access, could we set it as a backup source - I think they have an internal url listed for each file: https://iatidatastore.iatistandard.org/api/datasets/?publisher_identifier=AU-5&format=json but not sure if it actually accessible
where the DSv2 has access, could we set it as a backup source - I think they have an internal url listed for each file
Oooh – this is useful to know about, thanks! Interesting – those internal_url
s appear to have saved the HTML of a 404 page, that looks like this. That possibly means DSv2 is also struggling to import Australia DFAT data.
Thanks for this, @notshi!
The Dashboard is also having trouble downloading the files. http://dashboard.iatistandard.org/publisher/ausgov.html
I did not think to check the dashboard! The codeforIATI version doesn’t list ausgov as a publisher at all: https://dashboard.codeforiati.org/publisher/ausgov.html
Presumably that’s because it has never seen any ausgov data.
I moved iati-data-dump to github actions today, and it successfully managed to download ausgov data.
This data should start to bubble up through codeforIATI services, e.g. to datastore classic and the dashboard.
We are using User-Agent Mozilla/5.0 and that looks to be blocked by AU-5. This is a problem as we starting using this because other servers block curl.
So now we use the default curl ua as the the backup. And this seems to have solved issues with many servers that were previously giving us errors.
We now have over a million activities so it looks like this might have found us 40,000 activities.
http://d-portal.org/ctrack.html#view=search&reporting_ref=AU-5
http://d-portal.org/ctrack.html#view=dash_sluglog&slug=ausgov-af
By the way, there are still issues with some datasets by AU-5.
So now we use the default curl ua as the the backup. And this seems to have solved issues with many servers that were previously giving us errors.
I’m still unsure how this blocking works… Perhaps a combination of user-agent
and IP address? (I didn’t change user-agent
but did change IP address, and that also did the trick).
By the way, there are still issues with some datasets by AU-5.
That’s true… But those are 404 errors, so I think we should consider them separately.
I’m not really sure what to do with this ticket? I’m not convinced it’s fixed, but it sounds like we’re not experiencing the problem right now…
That’s true… But those are 404 errors, so I think we should consider them separately.
Agreed!
I think the ticket should still be opened due to ongoing issues and also because the Dashboard is still getting errors accessing the data. Should we raise it with the Dashboard maintainers?
I think the ticket should still be opened due to ongoing issues
Cool okay, agreed.
also because the Dashboard is still getting errors accessing the data
Oh, good point – you’re right.
Ok so this might trickle down to who/what tools we are adding data issues for.
Might be worth updating the readme?
So for example, this repo tracks issues that affect externally maintained tools using IATI, etc. This should include the Registry.
I mean, I'd like to consider this issue closed but AU-5 might hiccup tomorrow or in a week's time. Though we could re-open or create a new issue if that happens.
By the way, feel free to ignore suggestions if they seem pedantic! It's mostly for my train of thought and process.
No problem at all!
Let’s make a new meta
ticket to decide what should/shouldn’t be recorded in this repo.
Looks available? IATI.cloud had some issues on this as well, but https://iatidatastore.iatistandard.org/search/activity?q=reporting_org_ref:(AU-5)&wt=json&rows=13632 seems to show all AU-5 data?
Looks available? IATI.cloud had some issues on this as well, but https://iatidatastore.iatistandard.org/search/activity?q=reporting_org_ref:(AU-5)&wt=json&rows=13632 seems to show all AU-5 data?
The problem is, the DFAT server is a bit over-zealous in its blocking. So while lots of requests succeed (that link works for me, too) a lot of requests are getting blocked. Limiting availability is probably bad practice when serving open data.
For instance, the IATI dashboard still appears unable to access DFAT data:
While the problem might not be affecting our services, we have evidence suggesting the problem does still exist. So I’d rather not close this yet.
Yeah, we're aware of this issue. We also decided to not spend any more time on this as the data owner is seemingly very reluctant to make any changes on their end.
Hello! There has been no activity on this issue in the last 30 days. I wonder if it has now been resolved?
If you’re reading this, would you mind checking to see if the issue is still applicable?
Thank you!
The Dashboard is still unable to access DFAT (AU-5) data / servers.
Hello! There has been no activity on this issue in the last 30 days. I wonder if it has now been resolved?
If you’re reading this, would you mind checking to see if the issue is still applicable?
Thank you!
Previous issue still applies and I doubt this will change as the publisher seems non-responsive to correspondences.
In such cases, should there be a new label where a bug is defined as "won't fix" and no reminder set? @andylolz
In such cases, should there be a new label where a bug is defined as "won't fix" and no reminder set? @andylolz
^^ Yeah, there’s an evergreen
label for this purpose.
Explanation of the bug
Since around July 2020, Australia DFAT (AU-5) servers hosting IATI data appear to have been responding with a 403 status code to various services that consume IATI data. I suspect they are blacklisting these services by IP address, but that’s unclear.
When I request the data from my machine, it works fine, e.g. this dataset: https://iatiregistry.org/dataset/ausgov-af
However, we can see d-portal has had trouble downloading the same dataset: http://d-portal.org/ctrack.html#view=dash_sluglog&slug=ausgov-af
In fact, d-portal hasn’t successfully downloaded any Australia DFAT data since their recent server reboot: http://d-portal.org/ctrack.html#view=search&reporting_ref=AU-5
The new datastore also seems to have had problems with it: https://iatidatastore.iatistandard.org/search/activity?q=reporting_org_ref:(AU-5)&wt=json&rows=50
…And IATI data dump is showing errors for all Australia DFAT data: https://gist.githubusercontent.com/codeforIATIbot/f117c9be138aa94c9762d57affc51a64/raw/e9e26621d812b89789c6bbc8697fb4461bbb974e/errors
According to the old datastore, Australia DFAT data was last successfully fetched in July 2020: http://datastore.iatistandard.org/api/1/about/dataset/ausgov-af