Closed cormachallinanderilinx closed 2 months ago
@siwhitehouse @robredpath (I dont see an account for Dan) This is the issue which seems to be causing the intermittent Archiver failures if you want to link this to zendesk issues as discussed in the call
@dan-odsc ^^
Thanks @cormachallinanderilinx . For debugging purposes I think it would be helpful to understand at what stage the connection reset happened: is it possible to capture that?
For my information: does the archiver retry failed connections at all?
There isnt much logging here to tell us where it happens. I can patch in some logging for the next few runs, at least print exceptions and that will give us a better idea whats happening.
It seems to retry one if it goes into an Exception: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L450 Also this isnt very helpful: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L473
Then later on in the code it tries another request: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L509
I can patch in some logging for the next few runs, at least print exceptions and that will give us a better idea whats happening.
Thanks - if we can understand this for a few days it will be useful to understand what a possible solution might be.
Has the logging shown us anything interesting, @cormachallinanderilinx ?
@robredpath it seems to be running fully each night.
However I have seen some 403s
Example:
2024-02-28 07:39:57,920 WARNI [ckanext.iati.archiver] LinkCheckerError for 67a41006-d4fb-4269-a8eb-9a3857b49f1a 2024-02-28 07:39:57,920 ERROR [ckanext.iati.archiver] An LinkCheckerError for 67a41006-d4fb-4269-a8eb-9a3857b49f1a occurred: Server returned HTTP error status: 403 Forbidden
The ID here is the resource id, so you can get more resource details through the API:
https://iatiregistry.org//api/3/action/resource_show?id=67a41006-d4fb-4269-a8eb-9a3857b49f1a
You get the package_id
and add it to the end of https://iatiregistry.org/dataset/
If you want me to send you on the full list I can?
Interesting, thanks @cormachallinanderilinx . If I check https://iatiregistry.org/report/issues I do see one instance of "('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))" from this morning's run. The file downloads fine in my browser.
Interestingly, that one case that I can see does actually show the ConnectionResetError in the yellow warning box. Have you updated the code to handle this better already, or is this unexpected behaviour?
If I look at https://iatiregistry.org/dataset/38e40d0f-ef03-45d0-9939-bf5fc051a4e3 on https://iatiregistry.org/report/issues I actually see "Check the filetype metadata field" rather than a 403, which is interesting; 403 is the HTTP error code I get when trying to access the relevant file in a browser.
I suspect that 403s are going to be quite common, alongside 404s, as websites get updated and things move around. At some point we'll figure out a way of systematically engaging with publishers around this, but not just yet.
I guess...let's keep on keeping an eye on it?
yes, lets keep and eye on it
@cormachallinanderilinx Have we seen any instances of the archiver encountering "connection reset by peer" exceptions recently? Can we understand from your logging what its behaviour is?
Hi @robredpath I can see one if the last few days which was for wbtf-tf072887 The logs is coming from here: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L335
So it seems to continue and update the package with these details, which I think is the correct procedure?
Thanks @cormachallinanderilinx - that certainly seems to be the right behaviour, but if that code hasn't changed, then I'm confused as to what we saw that led to this issue in the first place.
I've given the code a quick eyeball and it looks like archive_package
catches a number of common (or possible) exceptions but I wonder if there's some that might get missed. I note that socket.timeout is now deprecated in 3.10 so I wonder if this is worth a bit of looking into.
I also find the ever-reassuring comment #TODO: What does this do?
in this code, which also might suggest that a bit of investigation would be worthwhile.
for socket.timeout I added this after (for extra logging) which might actually be handing what our previous issue was:
except Exception as e: log.error(f"New Exceptetion for {resource['id']} occurred: {str(e)}", exc_info=True) message = str(e) return save_package_issue(context, package, extras_dict, 'download-error', message)
I am adding a save_package_issue -> think we should make this a permanent change?
The TODO is an interesting one because it has been there since 2013: https://github.com/IATI/ckanext-iati/commit/94b3a2e6a2f1e4d3c454fe4fc4a65b995ab9fe0b This is the original commit, it seems to be just formatting the message: https://github.com/IATI/ckanext-iati/commit/112a80e7f53450b84eed103bc18987d6789f5bd0 If you like I can investigate it a little further?
Ah ha! I had a chat with @Bjwebb earlier and I think that's fine to make permanent - although it's a messy thing to do, it's in the context of a lot of messy code and I think the benefits outweigh the risks.
It'll be good to have more detailed logs of what happens when things go wrong, especially for those cases where something happens that we haven't seen before or happens very rarely.
@cormachallinanderilinx is this now permanent/? Can we close this?
This done, we can close
Brief Description Intermittent issues with IATI archiver not running
Severity Medium
Issue Location IATI Archiver
Steps to Reproduce Sporadically the archiver exists when a dataset fails with the following exception:
('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')
The archiver then exits without persisting the error and moving onto the next package.Expected Results/Behaviour Whenever a package resource gets
('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')
Archiver should catch the exception and archiver error should display in the UIActual Results/Behaviour Archiver error should be displayed in the UI and archiver moves onto the next package