bihealth / sodar-server

SODAR: System for Omics Data Access and Retrieval
https://github.com/bihealth/sodar-server
MIT License
14 stars 3 forks source link

Rethink taskflow exception raising and reporting #1505

Open mikkonie opened 1 year ago

mikkonie commented 1 year ago

This is something I didn't really think of before when taskflow was a separate component. We raise an exception every time a taskflow fails, whether it is an unexpected crash or an completely expected situation, like the checksum validation failing for a landing zone.

Because of this, error logs and sentry get flooded by benign "exceptions" which are really completely ok situations. Sure, the landing zone still needs to go into FAILED state and the user notified, but these are not software failures to be logged as errors.

TBD: Best way to handle this? Simply ignoring the exceptions in Sentry is the obvious first step, but we might also reconsider when to raise these zone failures as errors and when not. And how to make that distinction.

Comments are welcome, I will think of approaches myself.

mikkonie commented 1 week ago

One thing to add: we should also rethink the internal process where we always raise FlowSubmitException from the API instead of the actual exception type.

I've already had to come up with a workaround for detecting the exception type to avoid a lot of yak shaving, see #1847.