getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.84k stars 4.17k forks source link

Is there a way to get a more accurate representation of application availability other than crash-free sessions? #60348

Open vinilvasani opened 10 months ago

vinilvasani commented 10 months ago

Environment

SaaS (https://sentry.io/)

What are you trying to accomplish?

Problem Statement: We are seeking to establish a precise measure of application availability, and while crash-free sessions provide a valuable metric, they do not fully capture the user experience. As per the documentation, crash-free sessions also encompass errored sessions. However, when evaluating availability, both crashed and errored sessions represent failed user interactions with the application. Therefore, we are exploring alternative methods to accurately quantify the availability of our application. Specifically, we are investigating the feasibility of getting information from Sentry to determine the percentage of available sessions that were neither errored nor crashed, is there a way to get this data?

How are you getting stuck?

Not able to find sufficient information on how to get this data from the docs

Where in the product are you?

Unknown

Link

No response

DSN

No response

Version

No response

getsantry[bot] commented 10 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 10 months ago

Routing to @getsentry/product-owners-performance for triage ⏲️

k-fish commented 10 months ago

Hey there, can you tell us what platforms you're interested in?

vinilvasani commented 10 months ago

@k-fish I am interested to know this specific to the JavaScript Browser Platform. Additionally, I would like to know whether there exists a mechanism to filter errored sessions based on the rank or priority (or any other relevant tags) associated with the underlying error that marked the session as errored.

getsantry[bot] commented 10 months ago

Routing to @getsentry/product-owners-replays for triage ⏲️

k-fish commented 10 months ago

@vinilvasani thanks! Our version of browser sessions exists together with our replays product, their team should be able to answer your question in better detail.

bruno-garcia commented 10 months ago

However, when evaluating availability, both crashed and errored sessions represent failed user interactions with the application.

You can still see the healthy sessions, and errored etc, separately. In the Releases page.

image

Replay is useful to observe errors and how to reproduce them, and of course to empathize with your users, specially when there's a bad experience due to a slowdown or bug.

But Session Replay isn't the place to capture metrics of complete view of error-free vs errored/crashed session. Since it's unlikely you'll be capturing 100% of sessions.

Sessions on the other hands (as in Release Health) capture all sessions of ur users. Today it's based on 'page load' which isn't as intuitive and we plan on changing that, but regardless of the technical implementation, it's a cheap metric representing the status of the frontend. With Metric alert you should be able to get alerted based on that data.

Samples of replays will help u debug/dig in, specially linked to errors, etc

vinilvasani commented 10 months ago

Thanks @bruno-garcia I wanted to specifically understand if there was any way to check % of healthy sessions, against the total number of sessions is there a way for me to compute and see this data on the sentry dashboard? If not since the sentry browser SDK captures all session information is there a possibility to export this to another platform like Grafana or query it through an API?

Why isn't page load intuitive and how are you planning to change the session definition? Also why do you call it a cheap metric?

bruno-garcia commented 10 months ago

You can use our API to fetch the metrics, you can fetch by session status (healthy, abnormal, crashed, errored):

https://docs.sentry.io/api/releases/retrieve-release-health-session-statistics/

I imagine with discover or dashboards you might be able to do the same thing actually. Possibly even alerts.

matejminar commented 10 months ago

What @bruno-garcia said. In dashboards (and releases, and project details), you can plot healthy/errored/crashed/abnormal session counts separately, but AFAIK sessions do not support equations so unfortunately you can't do custom math there.

vinilvasani commented 10 months ago

Thanks, folks! So getting the percentage of healthy sessions against the total number of sessions does not seem possible?

Also is there a way for me to know which specific issue/error caused the session to be marked as errored? I would like to filter out sessions that have been errored by a specific issue I know did not cause any user experience breakages in my application from the total errored session count.

bruno-garcia commented 10 months ago

So getting the percentage of healthy sessions against the total number of sessions does not seem possible?

I'll leave this one for @matejminar as I don't know the answer

Also is there a way for me to know which specific issue/error caused the session to be marked as errored?

A session is marked crashed if an unhandled error happened. So you could look at events on that release that have handled:false. But note that session data is based on aggregates and are complete. The crash data/events could be sampled out due to many reasons (out of quota, filtered server-side, etc). So it's possible the exact numbers won't match (100 sessions, 90% crash free. Doesn't mean you'll find 10 unhandled errors necessarily all the time).

I would like to filter out sessions that have been errored by a specific issue I know did not cause any user experience breakages in my application from the total errored session count.

What would 'filter out sessions' mean? Sessions are just aggregates to get the counts. You can get information on events (through the issues page, or Discover or Dashboards) that will be based on exact occurances of errors. Then slice by handled=true/false, etc.

matejminar commented 10 months ago

So getting the percentage of healthy sessions against the total number of sessions does not seem possible?

The best bet is to create a dashboard widget like this: image The percentage calculation to get it down to just one number is not possible.

vinilvasani commented 10 months ago

Thanks @matejminar that makes sense!

@bruno-garcia My question is about errored sessions, according to the docs

We mark the session as errored if the SDK captures an event that contains an exception (this includes manually captured errors).

So what I wanted to know was is there a way to see the specific error or errors that have caused a session to be marked as errored, is that sort of information available in the session data?

Sessions are just aggregates to get the counts. What do you mean by sessions are aggregates here?

bruno-garcia commented 10 months ago

So what I wanted to know was is there a way to see the specific error or errors that have caused a session to be marked as errored, is that sort of information available in the session data?

Sessions and events (crashes) have the release information. So if you filter by release (also if you want by environment and time range) and you should see all events. All of the unhandled ones affected your crash free rate

vinilvasani commented 10 months ago

So does that mean that sessions do not have the information about the issue that caused them to be crashed or errored?

Example: when counting errored sessions while creating a custom dashboard widget, can I only count for errored sessions that were caused by an issue with rank P0?

Just to reiterate, is the information about the issue that caused the sessions to be errored/crashed available in the session data? If information about the issues does exist on session data, is there a way to write custom functions here?

Screenshot 2023-12-06 at 2 22 06 AM
bruno-garcia commented 10 months ago

Just to reiterate, is the information about the issue that caused the sessions to be errored/crashed available in the session data?

No. Sessions are aggregate and don't suffer from sampling.

More on this: https://github.com/getsentry/sentry/issues/60348#issuecomment-1830423508