ArroyoSystems / arroyo

Distributed stream processing engine in Rust
https://arroyo.dev
Apache License 2.0
3.81k stars 220 forks source link

missing crypto functions in `all_default_functions`? #708

Closed zhuliquan closed 3 months ago

zhuliquan commented 3 months ago

Hello, I'am Arroyo (which is Distributed stream processing engine based on DataFusion) user. Arroyo upgrade datafusion version latest (i.e. 0.40) recently. But I meet below error

2024-08-06T11:41:41.861247Z DEBUG datafusion_functions::crypto: /home/zhuliquan/.cargo/registry/src/rsproxy.cn-0dccff568467c15b/datafusion-functions-40.0.0/src/lib.rs:127: crypto functions disabled    
test arrow::expression_match::test::test_execuator_process_batch ... FAILED

failures:

---- arrow::expression_match::test::test_execuator_process_batch stdout ----
thread 'arrow::expression_match::test::test_execuator_process_batch' panicked at crates/arroyo-worker/src/arrow/expression_match.rs:64:88:
called `Result::unwrap()` on an `Err` value: Execution error: UDF sha256 not found

according to error info, sha256 udf is missing and not registry. Meanwhile, I found other crypto udfs are not available. It's seems that missing some builtin udfs in all_default_functions, I click crypto::functions, then jump to stranger micro code (image) but others jump to own package ( image

I notice that arroyo patchs crate DataFusion in Cargo.toml https://github.com/ArroyoSystems/arroyo/blob/9f1832387f7d1518bcf2d6b7a433d0c4dc3acf69/Cargo.toml#L67-L75

I try click same functions in real datafusion (see below image), It can jump to correct package, is there somethine wrong. image

mwylde commented 3 months ago

Thanks for the report! After digging it, it looks like datafusion moved the crypto functions behind a non-default cargo feature. I'll get out a PR to add that back.

mwylde commented 3 months ago

Fixed in #713