getsentry / sentry-java

A Sentry SDK for Java, Android and other JVM languages.
https://docs.sentry.io/
MIT License
1.11k stars 427 forks source link

Reducing writes performed by the PersistingScopeObserver #3168

Open sam-step opened 5 months ago

sam-step commented 5 months ago

Problem Statement

Hello, I’m looking into CPU usage of our Android React Native app and noticed we’re spending an unexpectedly large amount of time converting breadcrumb records to JSON and persisting them.

From what I can tell, the PersistingScopeObserver will delete and re-write the entire breadcrumb cache (by default up to 100 records) of the current Scope any time a new breadcrumb is added.

In some flows where we’re logging a fair amount (and therefore creating many breadcrumbs), these add up and noticeably affect performance.

My understanding is this writing of breadcrumb entries to disk is just for ANR detection and, today, there is no way to disable this behavior without disabling all caching, which may affect other features.

Assuming my understanding is correct, is there an appetite to make this behavior configurable (supporting batching writes, or even disabling writes, etc)? For our usecase, this is an instance where we’re happy to lose a bit of observability to retain performance.

Solution Brainstorm

No response

romtsn commented 5 months ago

hi @sam-step that's a valid request, this logic definitely needs some improvement. One workaround you could for now is to just remove it from the scope observer list as follows:

SentryAndroid.init(context) { options ->
        options.scopeObservers.removeAll { it is PersistingScopeObserver }

        // if you still wanna collect other data except breadcrumbs for ANR events, you can add it back but 
        // void the `setBreadcrumbs` method
        val delegate = PersistingScopeObserver(options)
        options.addScopeObserver(object : IScopeObserver by delegate {
            override fun setBreadcrumbs(breadcrumbs: MutableCollection<Breadcrumb>) {
                TODO("drop breadcrumbs you are not interested in and call the line below, or just don't do anything to drop all breadcrumbs")
                // delegate.setBreadcrumbs(breadcrumbs)
            }
        })
}
markushi commented 5 months ago

We see a few options on how we could improve the performance:

@sam-step does any of the above sound reasonable to you? It would be great to get some feedback!

sam-step commented 5 months ago

All of the above sound reasonable to me. I imagine the best solution is a combination of these things, although the first option seems like the easiest and highest-impact lever. Even with the current implementation there is the opportunity for data loss (although this definitely increases that potential). That potential can be mitigated by utilizing some of the signals described in the third option, so they seem complementary to me.

I'm not sure what the impact of moving to an append-able format would be. I'd imagine it'd be beneficial to reduce the volume of data we have to keep (re)-writing and transforming, but I'm also guessing a good chunk of the work that's affecting performance is just the many system calls we're doing, not necessarily the volume of data written. So if it's still a constant set of system calls we need to make per breadcrumb event, it may not buy much when there is a lot of activity going on. I did a very naive test on our app, just set max breadcrumbs limit to 1 to simulate what a solution like this might do, and there's a slight improvement over a larger limit, but still a very noticeable overhead compared to not writing to disk at all.

Having the ability to configure/swap out the "persistence layer", similar to @romtsn's suggestion but more baked into the API, might also be a nice-to-have thing to allow a bit more customization if needed as well.

romtsn commented 4 months ago

many system calls we're doing

can you elaborate a bit wdym by system calls?

sam-step commented 4 months ago

can you elaborate a bit wdym by system calls?

Sorry, meant file operations, wrong terminology. Also I'm just taking a guess at what is actually occupying time here, haven't quite dug any deeper than identifying the scope observer's caching method.