Closed alamb closed 1 month ago
I took a quick look at this and I believe we need to add a doc_category
for each udf to be able to slot it into the appropriate section in the documentation. For example, for scalar udf's that could be math
, conditional
, string
, etc
The doc_category
fn could return either be a simple string or more properly an enum, one for each type of UDF (scalar, aggregate, window, ...)
I don't like the fn name 'doc_category' but I couldn't come up with something better.
I don't like the fn name 'doc_category' but I couldn't come up with something better.
How about "doc_description_cateogory" or "doc_type" 🤔
take
A couple of ideas for follow ups to this issue if the PR is merged:
DESCRIBE xyz;
where xyz is a function. Ideally this would support not just the core functions in DF but also any added externally (for example those in https://github.com/datafusion-contrib/datafusion-functions-extra) and verify it works in the CLIUpdate; https://github.com/apache/datafusion/pull/12668 is looking quite nice 👌
I have filed a follow on ticket to track porting the rest of the docs here: https://github.com/apache/datafusion/issues/12740
Is your feature request related to a problem or challenge?
When we add a new function to datafusion's library we have to remember to document that function in the documentation, for example in https://datafusion.apache.org/user-guide/sql/scalar_functions.html
I observed this recently in https://github.com/apache/datafusion/pull/12429#pullrequestreview-2296500404. This likely means we have forgotten to document some functions or that the documentation has drifted over time
Also this means the help text for various functions can only be found on the DataFusion website, and not, for example within the function itself.
It would be awesome if you could do something like this from SQL:
Describe the solution you'd like
I would like:
DataFusion already does something like this for
ConfigOptions
For example, the comments in https://docs.rs/datafusion/latest/datafusion/config/struct.SqlParserOptions.html are automatically added to the documentation programatically:
Describe alternatives you've considered
I suggest this as a high level approach
ScalarUDFImpl
trait as proposed by @universalmind303 in https://github.com/apache/datafusion/issues/8366ScalarUDFImpl::description
andScalarUDFImpl::sql_example
ConfigOptions
to generate the sql reference from those functions.In terms of implementation order I would personally suggest breaking this project into smaller parts:
A first PR that does:
Then we can work in multiple PRs to port the remaining documentation over to the code (which will automatically result in the new page getting updated)
And then finally we can remove the old page when all functions are ported.
If we start working on this project, we (I) can file follow tickets to track porting the remaining functions / doing the same thing for aggregate functions, etc.
Additional context
Also, similarly, GlareDB has a way to automatically annotate functions with documentation, and @universalmind303 proposed something similar here https://github.com/apache/datafusion/issues/8366
Also, @findepi is considering implementing
SHOW FUNCTIONS
as part of https://github.com/apache/datafusion/issues/12144 that could also likely take advantage of this documentation if it was present