Lightning-AI / LitServe

Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.
https://lightning.ai/docs/litserve
Apache License 2.0
2.51k stars 160 forks source link

monitor and restart Logger process #291

Closed aniketmaurya closed 1 month ago

aniketmaurya commented 2 months ago
Before submitting - [ ] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) - [ ] Did you read the [contributor guideline](https://github.com/Lightning-AI/pytorch-lightning/blob/main/.github/CONTRIBUTING.md), Pull Request section? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests?

⚠️ How does this PR impact the user? ⚠️

Fixes #289

Loggers can stuck while processing and queue will pile up. This PR tries to resolve that by detecting stuck loggers

@lantiga please let me know what do you think about this approach?


TODO:

What does this PR do?

This PR introduces a Process monitor for Logger process and restart the process if a Logger is stuck. This prevents the logger queue from piling up.

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 86.20690% with 4 lines in your changes missing coverage. Please review.

Project coverage is 95%. Comparing base (b9ecbdb) to head (65c5f18). Report is 6 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #291 +/- ## =================================== - Coverage 95% 95% -0% =================================== Files 19 19 Lines 1244 1263 +19 =================================== + Hits 1182 1197 +15 - Misses 62 66 +4 ```
aniketmaurya commented 2 months ago

@lantiga could you please do an architecture review here? Or do you have a suggestion to handle this in a different way?