Docstring lowering - Githubissues

c42f commented 1 month ago

I've been reading the existing system for dealing with docstrings in Base. It looks to have grown organically over time and first impressions are that it's a bit of a horrifying maze at this point. Though some parts are undoubtedly required complexity which I don't appreciate yet.

Some observations:

The parser emits GlobalRef(Core, Symbol("@doc")) rather than something symbolic. This seems like premature lowering.
@doc calls Core.atdoc() during macro expansion. This is so bootstrapping works even without Base.Docs being defined. Bootstrapping sets this function to either
- A no-op (in boot.jl)
- A simple docstring accumulator docm() defined in "docs/core.jl" (in various places, but basically to deal with docs before Base.Docs is defined)
- The full implementation in Docs.docm()
- Docs.docm() does various gymnastics to work around the fact that lowering's desugaring pass hasn't occurred by the time @doc runs:
- It calls macroexpand to expand user macros to deal with "docstr"\n@some_user_macro xx (like what is supposed to be documented there without knowing what @some_user_macro expands to?)
- It pattern matches various special syntax cases, particularly short form function syntax
Docs.__@doc__ exists for good reasons but "feels pretty weird" - it's used as a marker by macro authors that want a docstring to propagate to only a part of the syntax emitted by the macro.

All this is just to document what I've found so far.

Currently JuliaSyntax already emits a special K"doc" kind for docstrings:

julia> parsestmt(SyntaxNode, "\"docstring\"\nfoo")
line:col│ tree                                   │ file_name
   1:1  │[doc]                                   │
   1:1  │  [string]
   1:2  │    "docstring"
   2:1  │  foo

In JuliaLowering we should somehow use this to make the implementation of docstrings a lot cleaner. Largely this can be managed by not expanding to @doc and rather just lowering the K"doc" kind as part of lowering proper - this way user macros won't be an issue.

It's unclear what to do about @__doc__, if anything. It serves a useful purpose but does feel like an oddity. One option could be to still represent this as a macro, but have it expand to some expression metadata which can be recognized by the lowering of K"doc" nodes.

c42f commented 1 month ago

Looking through Base.Docs to find its public API for programmatically attaching documentation, it seems it doesn't exactly have one. Ideally we don't want to go through the public macro API @doc because that uses the Docs module's internal approximation of Julia's existing lowering.

The closest alternative seems to be to call the semi-internal API Docs.doc!. For this, need to construct the Dict{Symbol, Any} metadata which Docs normally constructs itself. Simple example of doing this:

julia> module X
           public f, x

           x = 100

           function f(a::Int, b::String)
           end
       end
Main.X

julia> Docs.doc!(X, Docs.Binding(X, :x), Docs.docstr("some global", Dict{Symbol, Any}(:path => "foo.jl", :linenumber => 1, :module => X)))
Main.X.x

julia> Docs.doc!(X, Docs.Binding(X, :f), Docs.docstr("some function", Dict{Symbol, Any}(:path => "foo.jl", :linenumber => 6, :module => X)), Tuple{Int,String})
Main.X.f

help?> X.x
  some global

help?> X.f
  some function

Easiest way to figure out what Docstring lowering does for a given form is to use @macroexpand:

julia> @macroexpand Docs.@doc "hi$x" function f(x::Int, y::String)
       end
quote
    function f(x::Int, y::String)
        #= REPL[190]:1 =#
        #= REPL[190]:1 =#
    end
    (Base.Docs.doc!)(Main, (Base.Docs.Binding)(Main, :f), (Base.Docs.docstr)((Core.svec)("hi", x), (Dict{Symbol, Any})(:path => "REPL[190]", :linenumber => 1, :module => Main)), Union{Tuple{Int, String}})
end

MichaelHatherly commented 1 month ago

Hey @c42f, I'd be happy to help out where possible here with working out the oddities of the docsystem. Would be good to straighten out the maze of code that's developed since the initial writing.

From what you've summarized here it's pretty much spot on with how most of it works. The main complexity from what I recall was the bootstrapping stuff that was needed to be able to document prior to Base.Docs being available.

It's unclear what to do about @__doc__, if anything. It serves a useful purpose but does feel like an oddity. One option could be to still represent this as a macro, but have it expand to some expression metadata which can be recognized by the lowering of K"doc" nodes.

That seems like a good approach.

@doc calls Core.atdoc() during macro expansion.

Just a heads up that DocStringExtensions.jl does hook into atdoc

https://github.com/JuliaDocs/DocStringExtensions.jl/blob/ec66ad4a472241c7a7ae0686247fe578c5e50210/src/templates.jl#L1-L19

so worth keeping in mind that and attempting to accommodate some mechanism that allows for some kind of similar behaviour. We can easily adjust that package to use a newer mechanism if we settle on something nicer.

c42f commented 1 month ago

Awesome, thanks @MichaelHatherly for the feedback!

I might call on you to review the actual code at some point? I've already done a rudimentary proof of concept for this in https://github.com/c42f/JuliaLowering.jl/commit/adc1447f0b761835c9eb7da020b96239137fedc9 which confirms that we get the method signature "for free" in all detail that the compiler knows by moving docstring processing into lowering itself. So this seems promising.

Thanks for mentioning DocStringExtensions.jl. I think there's various ways this could be approached.

One might be to allow something very similar to hooking into atdoc! - but maybe in a cleaner way. For example to lower docstring processing to bind_docs!(), rather than Core.bind_docs!() as it is in my proof of concept. Then have Core.bind_docs!() exported from Core and packages pick that up by default from the Core namespace. Much like they pick up all the other standard exports. Then if a package defines its own bind_docs!() function, that would override the Base export.

Another idea would be to go in a different direction and support extended markup within docstrings - stylistically compatible with Documenter's @ref syntax, etc. Then have a system which interprets that markup later during docstring processing and fills in metadata from there. I think I like this idea better, as it could also allow the normal docsystem to do something useful with metadata currently only processed by Documenter.jl (IIUC)

c42f commented 1 month ago

To summarize https://github.com/c42f/JuliaLowering.jl/commit/adc1447f0b761835c9eb7da020b96239137fedc9

The idea is that docstrings like

"blah blah"
function f(x::Int)
end

lower to

... # <- some code defining the particular method of `f` here

Core.bind_docs!(f, "blah blah", method_metadata)

where method_metadata is already computed as part of the code which defines the type signature of f so it comes for free. It would be nice to use Base.LazyString as the lazy representation of the docstring with its interpolations, though I haven't implemented that yet. (Bootstrap makes this annoying but that can be solved.)

MichaelHatherly commented 1 month ago

I might call on you to review the actual code at some point?

Sure, fine with me.

I think there's various ways this could be approached.

Either of those seem worth investigating. Option one would at least avoid getting stuck in discussions about what "support extended markup within docstrings" would be.

c42f / JuliaLowering.jl

Docstring lowering #3