alkem-io / alkemio

START HERE! Cross project collaboration and shared documentation.
European Union Public License 1.2
23 stars 4 forks source link

Client Errors - Improve observability and quality #1293

Open bobbykolev opened 3 months ago

bobbykolev commented 3 months ago

Description

We need better observability of the client enabling us to track and fix user issues, set quality KPIs, and better categorize issues.

Goal

This initiative aims to positively impact our users' experience. Better detection and tracking of critical issues would lead to quicker focus and resolution. Setting quality KPIs will give visibility to the stakeholders of the client. The critical client errors should be resolved with priority.

Hypothesis

By utilizing and improving the Sentry integration (3rd party tool) and APM, we can better track and organize client errors. Once we better organize the errors and fix the critical ones, we can define specific KPI targets and set a monitoring schedule.

Must have scope

Structure the process around client observability:

[ ] Revise and improve Sentry logging. [ ] Revise APM, how we can use it in isolation or combination with Sentry. [ ] Set reasonable crash-free, performance, and unhandled errors KPIs. [ ] Set alerting, monitoring, and ownership (on critical errors, new errors, post-release, etc.).

Analysis of the issues being experienced, with recommendations of issues to be addressed:

[ ] Tag/categorize client errors (by severity and domain). [ ] Log the current critical errors. [ ] Log and fix or categorize the most common errors (to reduce the noise).

Optional:

[ ] Research what other features of Sentry could be utilized - Metrics, Replays, etc. - to better track user experience;

Next: [ ] Next epic planned in with heavier issues.

Here's a link to the initial challenge.

Stakeholders

@techsmyth @valentinyanakiev @me-andre @hero101 @Comoque1 @bobbykolev

welcome[bot] commented 3 months ago

Thanks for opening your first issue here! Be sure to follow the issue template!

bobbykolev commented 3 months ago

We already have implementation for Level (fatal, error, warning, etc) in Sentry/log.ts. We could extend the same implementation to enhance the context with tags: https://docs.sentry.io/platforms/javascript/guides/react/enriching-events/tags/ This could help us detect functionality like Auth/Server/Callout etc. or other useful information that could help us track an error faster.

bobbykolev commented 2 months ago

The following epic duplicates this one: https://app.zenhub.com/workspaces/alkemio-development-5ecb98b262ebd9f4aec4194c/issues/gh/alkem-io/alkemio/1291

techsmyth commented 2 months ago

@bobbykolev good catch, can you please merge the other epic into this one? So both description / information and also issues (if any / relevant still).

bobbykolev commented 2 months ago

Done.