delta-incubator / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
135 stars 36 forks source link

Limit recursion depth in kernel traversals? #397

Open scovich opened 2 hours ago

scovich commented 2 hours ago

We should consider... a depth counter to manage how deep we'll go. I've seen several hundred deep nested columns out in the wild before.

Originally posted by @hntd187 in https://github.com/delta-incubator/delta-kernel-rs/issues/395#issuecomment-2412839175

scovich commented 2 hours ago

I'm a bit torn on this. Enforcing limits to recursion depth should ideally be an engine decision. And they've already successfully parsed any expression they send to us.

The worst cases I've personally seen come from degenerate trees of binary AND/OR expressions. Especially large in-lists that the engine converted to e.g.:

OR(<col> = <val1>, OR(<col> = <val2>, OR(<col> = <val3>, OR(...)))

Kernel uses variadic AND/OR expression nodes, specifically to address that particular bad case.

We could possibly convert our FFI visitors to loop over work lists instead of using recursion -- the API we present to the engine is not actually recursive (!!), but work lists are weak to trees with high fanout where recursion is weak to deep trees. For example, recursion lets us iterate over the fields of a very wide struct, where work lists would require us to materialize them instead. Which bad case do we fear more?

Additionally, the schema transform traversal is stateful, and the well-nesting of recursion is an important part of that. For example, my WIP nested column support uses a schema transform in the data skipping code that maintains the current name path in a vector as it goes, so that we can compare the path of each leaf field against the paths referenced in the predicate expression we are trying to convert (in order to filter out unreferenced paths from the table schema). I don't know a good way to express that non-recursively.

scovich commented 2 hours ago

On the other hand... stack overflow is utterly catastrophic in rust, nearly SIGKILL level bad. No response or recovery of any kind is available, not even a backtrace. That's really not a nice failure scenario to inflict on some unsuspecting engine.