Open lightjacket opened 2 years ago
@lightjacket in my opinion a better idea is to return a list in one row?
@Jimexist thanks for the suggestion! I did consider that as an alternative but would need to unnest
the result in order to be able to join against additional tables like so:
Set-returning
SELECT * FROM
(SELECT inverted_index_search('some query') as item_id) t
LEFT JOIN additional_data ON t.item_id = additional_data.item_id;
Single row w/ a list
SELECT * FROM
(SELECT unnest(inverted_index_search('some query')) as item_id) t
LEFT JOIN additional_data ON t.item_id = additional_data.item_id;
So if the unnest
function existed, I would be unblocked. I think if I could create a set-returning UDF I could also implement unnest
myself as a user and wouldn't need to wait on a built-in unnest
.
If there is an alternative to unnest
to get that join I am looking for though, that would absolutely work. I just haven't found it yet if it exists.
let's add unnest as it is more general:
I'd argue that unnest
is less general (I'd think unnest
would be a specific implementation of a set-returning UDF), but the unnest
function certainly solves my use-case so if that's the preference, it certainly works for me. Thank you!
Is your feature request related to a problem or challenge? Please describe what you are trying to do. I'm looking at add an inverted index on my data and use it as the first step in more complicated data transformations. There's particular optimization that I think might be better done outside of the DataFusion engine.
Describe the solution you'd like I'm imagining something like the following, where
inverted_index_search
returns an arbitrary number of rows.I'd like to be able to write a udf (or udaf, or something new) that can return as many rows as it needs.
Describe alternatives you've considered I looked at having my UDF return an array, but once I had the array I could not figure out how to unnest it. Issue https://github.com/apache/arrow-datafusion/issues/212 looks to cover that, but I think it might be better to just have the function return multiple rows directly.
Additional context Apologies if I missed something that is already available for this.