apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
612 stars 113 forks source link

Include Rust's native stack trace in CometNativeException #104

Closed advancedxy closed 3 months ago

advancedxy commented 3 months ago

What is the problem the feature request solves?

Currently, the backtrace is only added when rust panics.

However, when the exception is thrown back to JVM side. The error msg sometimes might be not that helpful. It would be ideal to get the backtrace for the rust error as well.

Describe the potential solution

No response

Additional context

No response

viirya commented 3 months ago

I remember we intentionally trim stack trace to reduce binary size.

advancedxy commented 3 months ago

Seems reasonable to trim stack trace in production.

Maybe we can enable stack trace in the debug build?

viirya commented 3 months ago

If possible, then yes, I think it is good idea to add stack trace for debugging.

I remember we have discussed before and tried it but seems not working as the config cannot be changed between debug and release build.

Maybe I misremembered it. 🤔

sunchao commented 3 months ago

I think we looked into this before. The problem is a large part of the errors are originated from Arrow or DF. @comphead has worked on this in DF, see https://github.com/apache/arrow-datafusion/issues/7360. We could probably apply the same changes to Comet.

I remember we intentionally trim stack trace to reduce binary size.

Yes we did that for release profile, see https://github.com/apache/arrow-datafusion-comet/blob/main/core/Cargo.toml#L96. For debug profile, the backtrace should still contain full information including the file names and line numbers.

advancedxy commented 3 months ago

@comphead has worked on this in DF, see https://github.com/apache/arrow-datafusion/issues/7360. We could probably apply the same changes to Comet.

Sounds great.

comphead commented 3 months ago

I'll check how it works with the Comet but for DF the quick doc to enable the backtrace is https://arrow.apache.org/datafusion/user-guide/example-usage.html#enable-backtraces

Basically if you set the backtrace it should show the backtrace for most cases, the coverage on DF still getting improved. The backtrace will show rootcause within DF, however if the error happens on arrow-rs side, this is not covered on their side.

In this case we can only go to arrow-rs entrypoint and then debug manually

snmvaughan commented 3 months ago

There were examples where the inclusion of the panic backtrace with the Exception was very helpful. I can understand limiting the length/size of the resulting stacktrace, but there is a lot of utility in including at least some of the information.