duckdb / duckdb-r

The duckdb R package
https://r.duckdb.org/
Other
104 stars 21 forks source link

User Defined Functions with R #181

Open ran-codes opened 4 weeks ago

ran-codes commented 4 weeks ago

https://github.com/duckdb/duckdb/discussions/12438#discussioncomment-9705955

suggested to ask this in this repo.

Our team uses the tidyverse API and some basics statistics function in R for much of our data work. We have shifted to Duckdb - particualrly the duckdb-dbt workflow - and would be a gamechanger if we could do some of our R work in duckdb via UDF.

I have read https://github.com/duckdb/duckdb/issues/471 but am wonder if there is plans for support for other language UDFs. I think it would be a game changing feature.

paleolimbot commented 3 weeks ago

A while back I was involved in implementing Arrow UDFs and the challenge was basically that R code must execute on the same thread, so there is a slightly awkward dance where whenever Arrow does something that might execute some R code (like a user-defined function), we have to launch a thread to do "do arrow stuff" and run an event loop on the R thread that waits for R tasks that need executing on the other thread.

Some (not so) light reading that might be relevant (I am not sure how similar DuckDB UDFs are to Arrow UDFs):

https://github.com/apache/arrow/blob/fd11b7a042f0a020a0cde34e043ad2e77752bca6/r/src/safe-call-into-r.h

https://github.com/apache/arrow/blob/fd11b7a042f0a020a0cde34e043ad2e77752bca6/r/src/compute.cpp#L633-L788