Stacktraces and subscribeOn/observeOn

konmik commented 8 years ago

RxJava in not nice when it comes to stacktraces.

The worst thing is observeOn that schedules execution on different threads. Every time we use observeOn we get our stacktrace from the process root, all the call history gets erased.

The idea is to make schedulers save the caller's stack, wrap action calls and attach the stacktrace to each exception that has been thrown by scheduled actions.

Pros: traceable exceptions Cons: Performance cost Leaks because of recursion

I think that we can have Schedulers.io(boolean trace) alternative that will save the stacktrace, so we could call it Schedules.io(DEBUG) to turn stacktracing off on production for performance critical parts.

How do you guys find this idea?

akarnokd commented 8 years ago

Taking a stacktrace has a 100x overhead compared to a pass-through.

This comes up from time to time but I don't see why one wants to see a stacktrace from another thread when the exception happened on the scheduler's thread. If the exception was thrown before observeOn, the stacktrace will have that thread's trace.

Could you give some examples where (I'm guessing) the code crashed after observeOn but the cause was before it?

The only thing I can think of is the following code:

Observable.<Func0>just(null).observeOn(Schedulers.computation()).map(r -> r.call()).subscribe(...)

This will crash on the computation scheduler with NPE, but the problem is in just(). How would the stack trace from the main thread inside the observeOn's inner Subscriber's onNext method help here?

konmik commented 8 years ago

api.fetchRequestResult()
     .compose(parse()) // a crash is here
     .subscribeOn(io())
     .observeOn(mainThread())
     .subscribe(it -> whatever);

If i have the call to parse() in different places I can't trace which place is causing the error.

akarnokd commented 8 years ago

I'm guessing it crashes in response to a value, right? Why not add doOnNext() with a conditional breakpoint and move it along the pipeline to see where the bad value appears?

konmik commented 8 years ago

Becaue I can't put a breakpoint without user's permission on a device that is located 2000 miles away?

konmik commented 8 years ago

I also don't care about the performance overhead in this situation because inability to fix the crash has bigger impact on my app over a 0.01 ms slowdown during a network request.

akarnokd commented 8 years ago

Do you have highly branching code or do you use custom Operators? If neither and assuming such stack capture exists, how does the list of operators in the pipeline help you? You have access to the source code of parse() and is unit-tested anyway, right?

konmik commented 8 years ago

Unit tests do not cover all possible situations, sometimes crashes happen.

parse() is just an example, I don't have such function.

akarnokd commented 8 years ago

I'm still having difficulty imagining a situation where a remote app crashes and you need the stacktrace all the way to the equivalent of main(String[] args) on Android. Maybe an NPE is an uncommon case. What kind of crashes do you experience?

artem-zinnatullin commented 8 years ago

I see the value of good stacktraces because investigating concurrency problems is usually a hard and time-consuming task.

But Schedulers.io(boolean trace) is not a scalable solution, a good one will be API in the RxJavaPlugins so you could set includeStackTraces for all schedulers, and of course, it should be part of Scheduler class so every scheduler will implement it.

// Internally we've started implementing a very similar approach for non-rx concurrency and probably will do the same for RxJava Schedulers via schedulers hook soon. Would be nice to see it in the RxJava.

@akarnokd if you afraid of overhead — this setting is dynamic, so developers will usually enable it only for debug builds or even at runtime (many projects have Developer Settings menu in the app for such options).

Real life example of hard-to-debug crash — data bus built via PublishSubject, several "writers" and one "reader" with observeOn(), when "reader" crashes — detecting what piece of code sent incorrect data to the data bus is a non-trivial task that takes some non-zero amount of time.

As far as I remember from the recordings of @benjchristensen talks about RxJava he had some thoughts on improving debugging of RxJava code, @benjchristensen what do you think about subject?

akarnokd commented 8 years ago

But can you know what bad data is before posting it on the event bus? If so,why not add verification to sources of the events via doOnNext() and crash the lambda?

konmik commented 8 years ago

Because you can not verify all cases of bad data. You can also have a bug somewhere in the observable, for example if it parses the response itself.

akarnokd commented 8 years ago

RxJavaPlugin let's you decorate the Action0s so you can take and carry the stacktraces around. It appears to me that the problem is tangled with eventbus-like usage of RxJava. You can wrap your data into a structure that captures its origin information.

konmik commented 8 years ago

I don't think that plugins can help here - I want to have stack traces for some observables. If we inject a plugin we will be forced to always have that traces and lose the ability for recursive chain calls because of memory leaks.

konmik commented 8 years ago

@akarnokd No, I don't use RxJava as a bus. The problem is not tied to any specific use case, it is a general RxJava issue.

akarnokd commented 8 years ago

You can create a custom scheduler, do the decoration and apply it at places you desire.

stealthcode commented 8 years ago

@konmik I am familiar with this pain. The stacktrace of an exception that takes place in an observable chain that's been scheduled does not have any context of where in the code base it was composed. It only shows a worker that's scheduled the subscription or onNext behavior. Is there any specific information that we could add to the exception that would help?

stealthcode commented 8 years ago

Here is a simple example. There is no mention of my class or source code at all.

public static void main(String[] args) {
    Observable.empty()
    .observeOn(Schedulers.io())
    .toBlocking()
    .first();
}

Output:

Exception in thread "main" java.util.NoSuchElementException: Sequence contains no elements
    at rx.internal.operators.OperatorSingle$ParentSubscriber.onCompleted(OperatorSingle.java:131)
    at rx.internal.operators.OperatorTake$1.onCompleted(OperatorTake.java:53)
    at rx.internal.operators.OperatorObserveOn$ObserveOnSubscriber.pollQueue(OperatorObserveOn.java:195)
    at rx.internal.operators.OperatorObserveOn$ObserveOnSubscriber$1.call(OperatorObserveOn.java:162)
    at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

konmik commented 8 years ago

@stealthcode Sure, as I already said, we can take stacktrace right before placing an action into worker and when an exception occurs there, we can rethrow the raised exception with included saved stacktrace.

stealthcode commented 8 years ago

I had an idea for a tool that might help mitigate this problem. I think the desired output would be something like this:

Exception in thread "main" rx.RxOperatorFailure: operator callsite at:
        Main.java line 13 column 6 source: .first()
                                           ^
    at rx.internal.errors.ResolveErrorToCallsite.throwWrappedException(ResolveErrorToCallsite.java:000)
Caused by java.util.NoSuchElementException: Sequence contains no elements
    at rx.internal.operators.OperatorSingle$ParentSubscriber.onCompleted(OperatorSingle.java:131)
    at rx.internal.operators.OperatorTake$1.onCompleted(OperatorTake.java:53)
    at rx.internal.operators.OperatorObserveOn$ObserveOnSubscriber.pollQueue(OperatorObserveOn.java:195)
    at rx.internal.operators.OperatorObserveOn$ObserveOnSubscriber$1.call(OperatorObserveOn.java:162)
    at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

This could happen by having a tool which statically analyzed the user application's source code (at application start time) and looked for any callsite to any rx.Observable operator instance or static methods. This would log a few things to a static application callsite buffer. Things like "callsite-id", "line number", "column number", etc. Then at runtime, scheduling operators would hold on to and pass through their callsite id to their downstream subscriber. When an exception is encountered the operator would lookup the appropriate callsite ids and get the line #s then append that information into the thrown Exception's message.

@blesh for context (applicable for RxJS too).

konmik commented 8 years ago

@stealthcode I think that both - stacktrace and static analysis would be nice to have. However, RxJava does not currently have static analysis, so it can only be implemented as a separate library.

Petikoch commented 8 years ago

+1 @akarnokd, like @konmik and @stealthcode I suffer from the current lack of traceability support in RxJava when using Schedulers in pipelines. Especially in a larger codebase with more than 1 developer, you get confusing puzzler's and start wondering...

konmik commented 8 years ago

Here is the solution I came up with.

Just add .lift(traceOnError()) to a chain to get localized stacktraces.

public class OperatorTraceOnError<T> implements Observable.Operator<T, T> {

    private final StackTraceElement[] trace = new Throwable().getStackTrace();

    public static <T> OperatorTraceOnError<T> traceOnError() {
        return new OperatorTraceOnError<>();
    }

    @Override
    public Subscriber<? super T> call(Subscriber<? super T> child) {
        Subscriber<T> parent = new Subscriber<T>() {
            @Override
            public void onCompleted() {
                child.onCompleted();
            }

            @Override
            public void onError(Throwable throwable) {
                child.onError(new TracedException(throwable, trace));
            }

            @Override
            public void onNext(T t) {
                child.onNext(t);
            }
        };

        child.add(parent);
        return parent;
    }

    private static class TracedException extends RuntimeException {
        public TracedException(Throwable throwable, StackTraceElement[] trace) {
            super(throwable);
            setStackTrace(trace);
        }
    }
}

P.S. It will not help with the case of the bloking operator error because the exception happens after .lift(traceOnError()).

stealthcode commented 8 years ago

You could also use the RxJavaDebug plugin. This adds logging to all operators for virtually all relevant lifecycle steps.

Petikoch commented 8 years ago

@konmik, thanks a lot for sharing the .lift(traceOnError()) solution. Works fine for me.

konmik commented 8 years ago

OperatorTraceOnError has been updated to keep the original Throwable intact.

Petikoch commented 8 years ago

@konmik

I extended your traceOnError() operator a little bit and finally came to the following solution, which I call the preserveFullStackTrace operator:

public class PreserveFullStackTraceOperator<T> implements Observable.Operator<T, T> {

    private final RuntimeException asyncOriginStackTraceProvider = new RuntimeException("async origin");
    private final long originThreadId = Thread.currentThread().getId(); // should be "enough" unique. See http://stackoverflow.com/a/591664/1662412

    public static <T> PreserveFullStackTraceOperator<T> preserveFullStackTrace() {
        return new PreserveFullStackTraceOperator<>();
    }

    @Override
    public Subscriber<? super T> call(Subscriber<? super T> child) {
        Subscriber<T> parent = new Subscriber<T>() {

            @Override
            public void onCompleted() {
                child.onCompleted();
            }

            @Override
            public void onError(Throwable throwable) {
                if (Thread.currentThread().getId() != originThreadId) {
                    List<StackTraceElement> originalStackTraceElements = Lists.newArrayList(throwable.getStackTrace());
                    List<StackTraceElement> additionalAsyncOriginStackTraceElements = asList(asyncOriginStackTraceProvider.getStackTrace()).stream()
                            .filter(stackTraceElement -> !PreserveFullStackTraceOperator.class.getName().equals(stackTraceElement.getClassName()))
                            .collect(Collectors.toList());
                    Iterable<StackTraceElement> modifiedStackTraceElements = Iterables.concat(originalStackTraceElements, additionalAsyncOriginStackTraceElements);
                    throwable.setStackTrace(Iterables.toArray(modifiedStackTraceElements, StackTraceElement.class));
                }
                child.onError(throwable);
            }

            @Override
            public void onNext(T t) {
                child.onNext(t);
            }
        };

        child.add(parent);
        return parent;
    }
}

The main differences are:

It's more like your first version: The stacktrace-elements are modified, therefore we can still do things like instanceof and we have not a changed behaviour due the addtional operator
It does only the modification if the code is asnychronous

Usage: .lift(preserveFullStackTrace())

What do you think?

konmik commented 8 years ago

@Petikoch I like the idea of having an immutable throwable (my current solution) because it will survive any use case. The idea of checking thread id is nice but I afraid it will not work in some cases (If an exception happened in the background and then returned to the main thread to die here.)

From the other hand, instanceof is useful, it is easier to have it with your solution. My solution requires to check throwable's cause.

Petikoch commented 8 years ago

@konmik, interesting topic, isn't it? :-) Thanks for your feedback.

Cookizz commented 8 years ago

I'm using @konmik 's OperatorTraceOnError solution and I wrap it into an extended RxJavaObservableExecutionHook and register to RxJavaPlugins which works everywhere by registering only once. Does that work better?

abersnaze commented 8 years ago

I was thinking about and had the idea of only the capturing of the originating stack track one per worker when the first action is scheduled rather than every schedule call. This would amortize the cost over many actions. I would think most operators schedule on their worker instance in a fairly consistent way. It wouldn't be 100% but better than nothing.

danielesegato commented 8 years ago

@akarnokd you know RxJava perfectly, you probably do always everything as it should be done (unit testing and everything). What REALLY happen out there is that lot of people use RxJava without knowing every little bit of it or do not perfectly test everything.

The fact the libraries like this exist: https://github.com/android10/frodo to let you print what the flow of an Observable in the log or people asking for ways to have more meaningful stacktrace shouldn't be discarded like you are doing (closing this issue / refusing to acknowledge this is an area where RxJava can improve).

I'm pretty sure someone like you that know RxJava so well could come up with lot of better solutions then the one currently available to help debugging and tracing whats going on and it's a shame you just refuse to do it because for you they are not useful.

PS: please don't miss out on the choice of name of the library I linked ("frodo") and the image associated to it... Look at the gif in that page where Frodo hugs Sam almost crying. That's not just funny... It's inspired on the feeling you have finding that library when you are straggling understanding what's going on on a complex chain of Observables.

akarnokd commented 8 years ago

@konmik closed this issue as he found a way to support his streams by capturing the stacktrace via an operator. Several months later, as part of a reworked plugin system the assembly tracking, which is very similar to @konmik's solution, has been added to RxJava. It can be enabled at runtime after which all new streams get extra tracking information but it's not cheap and should be used as a diagnostic option.

We are a foundation for fluent reactive programming on the Android and JVM platforms with Java 6+ support and we try our best to support other libraries that wish to build upon us. However, we are limited by the Java platform as well as how exceptions and stacktraces work.

I'm pretty sure someone like you that know RxJava so well could come up with lot of better solutions then the one currently available to help debugging and tracing whats going on and it's a shame you just refuse to do it because for you they are not useful.

Nobody is stopping you or anybody to deepen his/her understanding of RxJava internals and come up with a solution, which if works, doesn't break binary compatibility and doesn't penalize those who don't need it, we are happy to accept PR(s) of it.

danielesegato commented 8 years ago

@akarnokd sorry I didn't noticed the issue has been closed by @konmik.

I still think an operator like the one used by @konmik or modified by @Petikoch would be nice to have in RxJava. Something that you can add on the single operator to obtain an audit of THAT operator. the reset() addition you cited is nice and also useful for testing but is a global on/off switch not suitable for when you have a bug in production you need to test on a specific observable.

I didn't want to sound accusing, I just think you are way better with RxJava that most of us and that sometimes you are a bit too quick to dismiss request like that because from your point of view are just wasting time while it would make RxJava easier for everyone else.

You just told me:

Nobody is stopping you or anybody to deepen his/her understanding of RxJava internals and come up with a solution

gee, as if deepening the understanding to such level was a thing you can achieve in half a day - it's not, not for me at least. I've read most of your blog which took way more then half-day and still didn't understood everything in it, nor would I feel comfortable I did everything right writing an operator or creating an Observable from scratch (Observable.create()).

Thanks anyway. If I'll ever achieve THAT much knowledge and find a way to ease things out for the beginner I'll surely write a PR ;)

akarnokd commented 8 years ago

@danielesegato

I don't want to sound too harsh but this is an open-source project and we (I) put in a lot of effort to keep RxJava up-to-date and running. As a project lead, I also have the duty to keep RxJava intact but often people want local solutions that disrupt or break other parts of RxJava they usually can't see. In addition, sometimes issues are too vague, very difficult or outright impossible to solve given our constraints.

The problem mentioned in this issue is typical for asynchronous programming; you'd have the same issues if you had ExecutorServices, queues and concurrent datastructures the classical and manual way with just Java itself. I'd guess you wouldn't run to Oracle complaining about your program not working as you expected despite all the fundamental building blocks Java gives you work correctly.

The second source of such issues is that people come from different prior asynchronous experiences and tend to project those old practices onto RxJava where they likely won't work or apply. Anecdotes say the a "aha" moment when suddenly it clicks; but I believe this is only for its declarative nature. There is another "aha" moment when you embrace it as "everything is a dataflow".

In similar stacktrace issues as this, either the user code throws at which point you have the exception point to your source code (you have to mentally filter out all irrelevant entries) or some operator signalled an exception rightfully following its spec (like single()) but you don't fully understand your sequence that lead to it. There is a no-zero chance you run into an operator bug but those are confirmed and fixed quite reliably I think.

As you mentioned such failures tend to be within production/private code and not enough information is provided to work out a solution that can also be unit tested within RxJava itself.

If you have a specific problem, please open an issue with all the details, unit tests to see if we can help you or not.

danielesegato commented 8 years ago

@akarnokd You are right! :) as soon as I stumble into something hard to debug I'll do what you suggest. Thank you.

whatisaphone commented 7 years ago

Since this is the top Google result for rxjava observeon stack trace, I'd like to point everybody (and myself next time I search for this) to the excellent Traceur library which solved this issue for me. It's a library for RxJava2 which replicates RxJava1's RxJavaHooks.enableAssemblyTracking().

RikkiGibson commented 6 years ago

https://github.com/akaita/RxJava2Debug is similar. I'm not able to get more useful stacks when I run these types of extensions alongside my project, though, and I wish I knew why.

In a use case like networking, it's very important to be able to work backwards from an error in a response to the code which issued the request--and if that request happened in a callback for some previous response, what request initiated the previous response, etc... There's no amount of upfront validation or unit testing that will substitute for the behaviors you actually see when you put all the components of a system together and run it in a production-like environment.

This is somewhat easier to do when you use synchronous networking, or async networking in .NET which manipulates async stacks when debugging so that you can see what initiated a task when you're debugging the completion of the task. Now, Rx is a very different paradigm from async/await, and so this stack rewriting solution (which resembles the assembly tracking extensions for RxJava) doesn't always map that well onto it.

There is a great deal of value that could come from an improvement to the debugging experience here that is stable and performant enough for production use. I'm experimenting with the tracing operator posted a while back by @konmik and it's definitely an improvement--planning to experiment more and see what the experience is like in more complex tracing scenarios.

starkej2 commented 6 years ago

https://github.com/akaita/RxJava2Debug is similar. I'm not able to get more useful stacks when I run these types of extensions alongside my project, though, and I wish I knew why.

RxJava2Debug didn't add any useful information for me either...not sure what I'm doing wrong. Traceur was very helpful, but doesn't seem to be working for me with Crashlytics (https://github.com/T-Spoon/Traceur/issues/8).

Mainly trying to solve the problem of often not having useful stacktraces for crash reports involving RxJava code, which causes difficulty in reproducing and fixing those bugs. RxJavaAssemblyTracking doesn't appear to add any extra info to my stacktraces (but it does look useful for debugging during development), so I'm still hunting for a solution that can be used in production code.

@akarnokd I was wondering what your thoughts are on an approach like this? Seems to provide more useful crash logs without the large performance overhead...although at the cost of more manual effort.

akarnokd commented 6 years ago

Depends on what type of error you want to track down. Null returns could be using my function tagging code. Otherwise, that looks a way but I'd add a tag string to dropBreadcrumb so that the error message is identifiable.

starkej2 commented 6 years ago

Thanks for the quick response! We're mostly having difficulties with exceptions thrown by Retrofit or APIs of third party libraries that we're wrapping with RxJava...the stacktrace never seems to go back far enough to help us figure out the source.

RikkiGibson commented 6 years ago

I think the "network call failed, where was it initiated?" use case is pretty well addressed by the breadcrumb approach you suggest @starkej2 :+1:

halfhp commented 6 years ago

Another option: https://github.com/halfhp/rxtracer

Other approaches were either too verbose in their implementation for my taste or simply did not work. RxTracer uses assembly tracking to supply the original exception with a more useful stack trace, so exception type, properties etc. are preserved. It's used by invoking RxTracer.enable();.

piomar123 commented 4 years ago

This is still one of the top search results, so I'm leaving here the solution for RxJava 3: https://github.com/akarnokd/RxJavaExtensions#debug-support

danielesegato commented 4 years ago

This is still one of the top search results, so I'm leaving here the solution for RxJava 3: https://github.com/akarnokd/RxJavaExtensions#debug-support

This is hardly a solution.

Helps only while developing or if you have a reproducible bug.

Doesn't help if you get an obscure crash in production that you do not know how to reproduce.

snorkel123 commented 3 years ago

@danielesegato Could you give concrete example when akarnokd's RxJavaAssemblyTracking was not helpful?

danielesegato commented 3 years ago

@AlexShafir RxJavaAssemblyTracking is always helpful.

It comes with performance drawbacks.

It's a global on/off switch.

When you have a bug in production you can't reproduce, it's not an on/off bug and the stacktrace you get is not helpful the only option you have is turn that flag on in production for everybody and wait (hope) for the bug to happen.

This is a common issue with any framework of this type. Kotlin Coroutines have similar issues.

My knowledge of RxJava is far greater than my first comments in this issue and I've just learned to write better code with it. I hardly end up in any issue like this nowadays.

I remember i would have love to turn the behavior on for just a couple of observables (so the performance issue would be limited to that).

Nowadays it's more an annoyance than anything, would be nice if i could see a better stacktrace all the time but I don't wanna give up performance for it.

ReactiveX / RxJava

Stacktraces and subscribeOn/observeOn #3521