apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
738 stars 144 forks source link

Implement initial version of to_json #631

Closed andygrove closed 2 weeks ago

andygrove commented 2 months ago

What is the problem the feature request solves?

Now that we have support for CreateNamedStruct in https://github.com/apache/datafusion-comet/pull/620, we could start working on to_json functionality. This functionality is very similar to our existing logic for casting from various data types to string but with JSON formatting.

Describe the potential solution

No response

Additional context

No response

jatin510 commented 2 months ago

Hello @andygrove I would like to work on this issue. Can you please assign it to me

viirya commented 2 months ago

Thanks @jatin510 . Assigned to you.

dharanad commented 2 months ago

@jatin510 I am willing to be a co author on this PR. Is that fine ?

Can by solving for below query ?

select to_json(named_struct(expression1_name, expression1_input[, ..., expression_n_name, expression_n_input]))
dharanad commented 2 months ago

@andygrove QQ: Upon checking i found out that DataFusion doesn't currently support a built-in to_json function. While implementing it directly in Comet is an option, there might be a more efficient approach by implementing the function in datafusion and reusing it here. What are your thoughts on these considerations?

andygrove commented 1 month ago

@andygrove QQ: Upon checking i found out that DataFusion doesn't currently support a built-in to_json function. While implementing it directly in Comet is an option, there might be a more efficient approach by implementing the function in datafusion and reusing it here. What are your thoughts on these considerations?

@dharanad It makes sense to add this in DataFusion. I see that Postgres also supports to_json.

Spark supports a number of options in to_json to control how dates and times are formatted. As long as those options are also available in the DataFusion version then I think we should be able to reuse it directly.

andygrove commented 1 month ago

I am working on an initial implementation of this and will have a PR up soon. This should make it easy for others to contribute to flesh out the functionality more. I will create this in Comet first since it has a lot of Spark-specific logic but maybe we can find a way to abstract that out so that we can upstream the bulk of the feature.