IATI / ckanext-iati

CKAN extension for the IATI Registry
http://iatiregistry.org
9 stars 6 forks source link

Intermittently IATI Archiver incorrectly exits #436

Closed cormachallinanderilinx closed 2 months ago

cormachallinanderilinx commented 5 months ago

Brief Description Intermittent issues with IATI archiver not running

Severity Medium

Issue Location IATI Archiver

Steps to Reproduce Sporadically the archiver exists when a dataset fails with the following exception: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer') The archiver then exits without persisting the error and moving onto the next package.

Expected Results/Behaviour Whenever a package resource gets ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer') Archiver should catch the exception and archiver error should display in the UI

Actual Results/Behaviour Archiver error should be displayed in the UI and archiver moves onto the next package

cormachallinanderilinx commented 5 months ago

@siwhitehouse @robredpath (I dont see an account for Dan) This is the issue which seems to be causing the intermittent Archiver failures if you want to link this to zendesk issues as discussed in the call

robredpath commented 5 months ago

@dan-odsc ^^

robredpath commented 5 months ago

Thanks @cormachallinanderilinx . For debugging purposes I think it would be helpful to understand at what stage the connection reset happened: is it possible to capture that?

For my information: does the archiver retry failed connections at all?

cormachallinanderilinx commented 4 months ago

There isnt much logging here to tell us where it happens. I can patch in some logging for the next few runs, at least print exceptions and that will give us a better idea whats happening.

It seems to retry one if it goes into an Exception: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L450 Also this isnt very helpful: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L473

Then later on in the code it tries another request: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L509

robredpath commented 4 months ago

I can patch in some logging for the next few runs, at least print exceptions and that will give us a better idea whats happening.

Thanks - if we can understand this for a few days it will be useful to understand what a possible solution might be.

robredpath commented 4 months ago

Has the logging shown us anything interesting, @cormachallinanderilinx ?

cormachallinanderilinx commented 4 months ago

@robredpath it seems to be running fully each night. However I have seen some 403s Example: 2024-02-28 07:39:57,920 WARNI [ckanext.iati.archiver] LinkCheckerError for 67a41006-d4fb-4269-a8eb-9a3857b49f1a 2024-02-28 07:39:57,920 ERROR [ckanext.iati.archiver] An LinkCheckerError for 67a41006-d4fb-4269-a8eb-9a3857b49f1a occurred: Server returned HTTP error status: 403 Forbidden

The ID here is the resource id, so you can get more resource details through the API: https://iatiregistry.org//api/3/action/resource_show?id=67a41006-d4fb-4269-a8eb-9a3857b49f1a You get the package_id and add it to the end of https://iatiregistry.org/dataset/ to view the package Example: https://iatiregistry.org/dataset/38e40d0f-ef03-45d0-9939-bf5fc051a4e3

If you want me to send you on the full list I can?

robredpath commented 4 months ago

Interesting, thanks @cormachallinanderilinx . If I check https://iatiregistry.org/report/issues I do see one instance of "('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))" from this morning's run. The file downloads fine in my browser.

Interestingly, that one case that I can see does actually show the ConnectionResetError in the yellow warning box. Have you updated the code to handle this better already, or is this unexpected behaviour?

If I look at https://iatiregistry.org/dataset/38e40d0f-ef03-45d0-9939-bf5fc051a4e3 on https://iatiregistry.org/report/issues I actually see "Check the filetype metadata field" rather than a 403, which is interesting; 403 is the HTTP error code I get when trying to access the relevant file in a browser.

I suspect that 403s are going to be quite common, alongside 404s, as websites get updated and things move around. At some point we'll figure out a way of systematically engaging with publishers around this, but not just yet.

I guess...let's keep on keeping an eye on it?

cormachallinanderilinx commented 4 months ago

yes, lets keep and eye on it

robredpath commented 4 months ago

@cormachallinanderilinx Have we seen any instances of the archiver encountering "connection reset by peer" exceptions recently? Can we understand from your logging what its behaviour is?

cormachallinanderilinx commented 4 months ago

Hi @robredpath I can see one if the last few days which was for wbtf-tf072887 The logs is coming from here: https://github.com/IATI/ckanext-iati/blob/master/ckanext/iati/archiver.py#L335

So it seems to continue and update the package with these details, which I think is the correct procedure?

robredpath commented 4 months ago

Thanks @cormachallinanderilinx - that certainly seems to be the right behaviour, but if that code hasn't changed, then I'm confused as to what we saw that led to this issue in the first place.

I've given the code a quick eyeball and it looks like archive_package catches a number of common (or possible) exceptions but I wonder if there's some that might get missed. I note that socket.timeout is now deprecated in 3.10 so I wonder if this is worth a bit of looking into.

I also find the ever-reassuring comment #TODO: What does this do? in this code, which also might suggest that a bit of investigation would be worthwhile.

cormachallinanderilinx commented 4 months ago

for socket.timeout I added this after (for extra logging) which might actually be handing what our previous issue was: except Exception as e: log.error(f"New Exceptetion for {resource['id']} occurred: {str(e)}", exc_info=True) message = str(e) return save_package_issue(context, package, extras_dict, 'download-error', message) I am adding a save_package_issue -> think we should make this a permanent change?

The TODO is an interesting one because it has been there since 2013: https://github.com/IATI/ckanext-iati/commit/94b3a2e6a2f1e4d3c454fe4fc4a65b995ab9fe0b This is the original commit, it seems to be just formatting the message: https://github.com/IATI/ckanext-iati/commit/112a80e7f53450b84eed103bc18987d6789f5bd0 If you like I can investigate it a little further?

robredpath commented 4 months ago

Ah ha! I had a chat with @Bjwebb earlier and I think that's fine to make permanent - although it's a messy thing to do, it's in the context of a lot of messy code and I think the benefits outweigh the risks.

It'll be good to have more detailed logs of what happens when things go wrong, especially for those cases where something happens that we haven't seen before or happens very rarely.

robredpath commented 2 months ago

@cormachallinanderilinx is this now permanent/? Can we close this?

cormachallinanderilinx commented 2 months ago

This done, we can close