chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
13.36k stars 1.14k forks source link

[ENH] catch panics in tasks #2389

Open codetheweb opened 1 week ago

codetheweb commented 1 week ago

This catches panics when they occur inside dispatched tasks, preventing the associated worker component from crashing. Depending on upstream handling this can still result in a component crashing--many places use .unwrap() instead of handling errors.

I originally started down the route of adding lifecycle events to components, but after talking with @Ishiihara we decided that the complexity tradeoff may not be worth it right now, and catching panics at the task level is quite a bit easier. (Unlike components, tasks have the concept of a request/response unit, so it's much easier to propagate the error.)

If something panics outside a task, e.g. logic in an orchestrator, the query node will survive as Tonic catches it. The only other place we have to worry about are panics in the CompactionManager/CompactOrchestrator which can be handled separately.

github-actions[bot] commented 1 week ago

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

HammadB commented 1 week ago

If we add a standard reply type to Components in System (which we have wanted to do) can't we get this for free here?

I'd somewhat prefer that as it is more widely usable as this is built on top of system components. Wdyt?

If that is seen as too much rn I'd be ok with taking this tech debt if seen that way (I have some minor ergonomics comments if so)

codetheweb commented 1 week ago

If we add a standard reply type to Components in System (which we have wanted to do) can't we get this for free here?

I would prefer this, I didn't realize it was something planned. I'll take a look.