Signal listener stops on error

timobrembeck commented 2 years ago

When investigating another issue (https://github.com/digitalfabrik/integreat-cms/issues/1818), we noticed that the error completely killed the signal listeners for the whole lifetime of the server's process. This means, even if I removed the long URL and inserted a shorter one, it was not picked up by the library. Only after I restarted the server, the signal listeners worked again.

I assume this is due to the fact the the global worker_running stays True after the error and no other tasks can be spawned?

I think that the queue should be a little more robust and recover after an error - but I'm not sure if that's feasible with reasonable effort.

Sorry if I messed something up and the error was somehow caused by my setup. If you agree with the issue in general, I would be willing to work on a solution.

claudep commented 2 years ago

Please do!

svenseeberg commented 1 year ago

I think there is one architectural question that needs to be answered: is there any error in one task that should kill the queue? I think not - any error in one task should not prevent the next task from being executed.

That means instead of trying to restart the process, we could also prevent errors from escalating by catching everything in https://github.com/DjangoAdminHackers/django-linkcheck/blob/df6ecc2eca891aba9018b9b9e9dc610a9dc49330/linkcheck/listeners.py#L24

If something happens, an error should be logged. But the worker should simply proceed to the next task.

DjangoAdminHackers / django-linkcheck

Signal listener stops on error #117