MichaelHatherly / Docile.jl

Julia package documentation system.
Other
40 stars 15 forks source link

Use md"...." for markdown comments? #29

Closed stevengj closed 9 years ago

stevengj commented 10 years ago

Using an explicit prefix, via a string macro, has two advantages:

The second point is more of a tradeoff, I suppose, but typing two extra characters seems a small price to pay. The first point is extremely compelling to me.

("..." strings could default to text/plain, or even be disallowed.)

MichaelHatherly commented 10 years ago

I've go a @tex_mstr macro to avoid the escaping problem. md, rst, etc macros would be a good plan. Another way of going about the formatting would be to specify it in the metadata such as:

@doc """

""" { :format => "md" } ->
myfunc(x) = x

though that's perhaps a bit of a pain to have to write. Setting a module-level choice of formatting via the @docstrings macro might be a way around that.

I've recently (in the last few days) moved the formatting code to Lexicon.jl, which isn't currently registered. This makes Docile format agnostic I hope. Docile is just storing the docstrings as raw strings now, previously I was parsing to Markdown.jl ASTs at module load time.

stevengj commented 10 years ago

I think it's much cleaner for the object to know its format (and output via writemime etcetera) than to use extra metadata.

Then, as in my original proposal in JuliaLang/julia#3988, Docile wouldn't store strings at all:

MichaelHatherly commented 10 years ago

md"..." would definitely be much cleaner. I'm not sure if there's a way to interpolate into string macros though, which is a feature I'd like to keep.

Would the Markdown objects just wrap the string, ie

type Markdown
    content::String
end

or would the content be converted into an AST when a module is loaded?

Docile 0.1.0 parsed all the docstrings into Markdown.Blocks. This led to a slow down in the load time of modules I was testing on since Docile had to import Markdown.jl and then parse each string as well. This won't be a problem once pre-compilation is available, but for now it's a bit annoying.

0.2.0 now just captures the strings and leaves the parsing and rendering business to Lexicon. This has made loading modules that use @doc quite a bit faster (just anecdotal evidence though, I've not got any numbers regarding this).

Wrapping the strings in Markdown objects rather than adding extra metadata during the @doc-stage and then implementing all the writemime methods in Lexicon might be the way forward?

stevengj commented 10 years ago

@MichaelHatherly, there is no way to interpolate into string macros (this is the tradeoff for no escaping); if you need that (rarely, I think?), you would need to do Markdown("....") or similar.

I agree that it would make the most sense to store it as a String. Parsing the AST would only be necessary if you wanted to convert it to, say, HTML, or display it in formatted form in the REPL with terminal colors etc, and since Jupyter will shortly support the direct display of text/markdown, this kind of conversion should rarely be necessary, so it can be done only when needed.

writemime to text/plain and text/markdown require no parsing, so these could be implemented in Docile. I agree that it makes sense to put writemime methods for other output formats into a separate package.

ivarne commented 10 years ago

You can do string interpolation in a string macro, but you have to implement it yourself. You get the raw string, and do your replacements. You can return expressions that reads local variables just like a normal "insert $var here" is seen as

Expr 
  head: Symbol string
  args: Array(Any,(3,))
    1: ASCIIString "insert "
    2: Symbol var
    3: ASCIIString " here"
  typ: Any

in the AST.

MichaelHatherly commented 10 years ago

Markdown("...") should do fine. The interpolation is rare, I've used it once or twice — mostly just to test it out. I'll have a look at implementing interpolation @ivarne. I presume that Julia's interpolation code is here?

I suspect having Markdown etc, be subtypes of something like an abstract Docstring might be the right way to structure this?

abstract Docstring

type Markdown <: Docstring
    content::String
end

# ...

Or is there already an abstract type in Base that could be subtyped instead?

ivarne commented 10 years ago

I think we might want abstract Docstring in Base at some point, but that can be added later when there is actually a proven need for it.

MichaelHatherly commented 10 years ago

Yes, that makes sense.

stevengj commented 10 years ago

I don't think they should be subtypes of anything (other than Any). Documentation doesn't even need to be a string. It just has to be some object with some writemime method to a format that you want.

stevengj commented 10 years ago

@ivarne, I think in this context we wouldn't want to implement "manual" interpolation. The reason is that the lack of interpolation here is a feature: it let's you easily write documentation examples that contain interpolation (meant to be printed literally as $..., not actually interpolated by the documentation system).

ivarne commented 10 years ago

That is a valid concern that I support. Programmable documentation should probably be a little less surprising. I just said it was possible with string macros, and how, not that it was desirable.

MichaelHatherly commented 10 years ago

Perhaps an explicit esc flag could be used if interpolation is desired:

@doc md"""
1 + 1 = $(1 + 1)
"""esc ->
f(x) = x

Not such a high priority though.

MichaelHatherly commented 10 years ago

Closed by #35.

MikeInnes commented 10 years ago

I should mention that, although I haven't implemented it yet, I was going to have something like an md"..." macro anyway. Having to reimplement interpolation isn't a bug, it's a feature – since we can do things like embedding plots and other data in the Markdown AST and rendering them as appropriate.

I don't really agree with disallowing plain strings as documentation – why not just default to the common case (i.e. treat the string as if it were md"...")?

Regardless, Docile looks like it's coming along really nicely. Soon I should have time to give Markdown.jl a much-needed overhaul so that it's ready for things like this.

MichaelHatherly commented 10 years ago

I've got basic interpolation going, it's in interpolate.jl. Docile's defaulting to treating strings as markdown, and that can be set using the @docstrings metadata Dict:

@docstrings [ :format => MarkdownDocstring ]

Subject to naming changes.

Since there's only a markdown parser currently that setting's kind of redundant, but having a default has been much nicer than changing all the docstrings I've got to md"..."s.

There's a few quirks I've come across in Markdown.jl, which I'll post as issues tomorrow sometime.

MikeInnes commented 10 years ago

Sure, go ahead, I'll definitely try and fix any problems. More generally I want to aim for the new Common Markdown standard, which should iron out a lot of quirks.

Are those interpolate.jl functions parsers in the Markdown.jl sense? In principle you should be able to inject the $ trigger into Markdown.jl and parse interpolation without any internal changes. If not that's fine too, it won't be hugely difficult to knock something up.

MichaelHatherly commented 10 years ago

Yes, aiming for the standard would be great.

I wasn't aware of the Markdown.jl parser interface, so there're probably not going to fit straight in.

stevengj commented 10 years ago

I'm not sold on the need for interpolation here; it seems like an unnecessary complication. In the rare cases where it is needed, it seems like one could just use MarkdownString("....") with normal Julia interpolation (and manually escaping any $ that is not intended to be interpolated).

MichaelHatherly commented 10 years ago

To avoid having to type MarkdownString when wanting interpolation, using a default format specified via @docstrings [:format => MarkdownString] seems to be a good compromise to me.

No need for the extra complication of interpolation or calling MarkdownString for most of one's docstrings if that happens to be the necessary behaviour required for a project.

stevengj commented 10 years ago

What does :format => foo do, in general? Call foo(x) on every documentation object x? Only in the current module? What if x is not a string? What if x is html"...." or some other rich-text object already?

In what project would most of the docstrings require pervasive interpolation? My comment was predicated on the guess that interpolation in docstrings will be rare. (If the project wants to add some boilerplate to the end of every docstring, wouldn't it be better for the project to define a custom string macro mymd"..." that appends the boilerplate?)

It just seems cleaner and more flexible to me to let each documentation object x be constructed using standard Julia syntax, with no global state applying hidden transformations.

MichaelHatherly commented 10 years ago

:format => Format just specifies what function (in this case Format) to apply to raw strings that are used as documentation. Any string macros such as html"..." or md"..." aren't touched at all by this and are just stored in the Entry object created by @doc as whatever object the string macro returns.

Since @docstrings creates a Documentation object, called __METADATA__, in the current module the effects of :format are limited to docstrings stored in that Documentation object in the current module. I'm not sure if this would be classed as global scope or not though.

Interpolation could be used to create REPL-like code examples (such as python docstrings sometimes have) with results without having to write the results into the string itself:

@doc md"""

**Examples:**

```julia
julia> [1, 2, 3, 4]
$(stringmime("text/plain", [1, 2, 3, 4]))
...
"""

After writing that out though, perhaps there's a better solution to that particular problem.

My biggest problem is having to specify what kind of documentation I'm writing for every single docstring. It is only 2 characters, but if I can just chose a default and not have to worry about it after that I'd rather do that. I'd guess that for most packages docstrings will be written in just one format, rather than a mixture.

MikeInnes commented 10 years ago

That's exactly the right idea – we want to write things like

@doc """
Returns a random walk, which will look something like:

$(plot(y=randwalk(100), x=1:100))

* `n`: The number of samples in the walk.
""" ->
randwalk(n) = cumsum(rand(n))

The idea is to embed that plot object directly in the AST so that you can render it to HTML, terminal etc. as appropriate. This is essential – using videos, images etc. as documentation is only a gimmick if you can't mix them with text and other information.

However, note that for this to work you can't just apply a function to the string, you have to let the string macro do its work at compile time. The @doc macro will have to take the raw string and transform it into an @md_str "..." expression. This could still be customisable, of course – you just need to choose a string prefix like md to use by default rather than an arbitrary function.

That said, I think we can worry about the technical details once the md macro is actually implemented and working – I'll try and get that done soon.

stevengj commented 10 years ago

@one-more-minute, that's a good example.

However, I think that interpolation is probably the wrong model here, since I don't think you want to execute the interpolated code (e.g. the plot command) when the documentation object is created. Instead, you want to execute the code lazily, when the documentation is actually needed, e.g. when the Markdown AST is created (lazily, as discussed above). This is important for two reasons: (i) performance, since you don't want to make it really slow to load code that has documentation with embedded code, (ii) dependency order: if you are loading a module, the code needed to run the plot command might not actually run until the rest of the module is loaded.

So, it might be better to have a custom markdown extension giving a syntax for embedded Julia code to be evaluated and substituted when the AST is parsed. But then it won't rely on interpolation. And you probably don't want to use $(...) because that will conflict with code examples using interpolation.

MichaelHatherly commented 10 years ago

Regarding a custom extension for embedded julia code, would fenced_code_attributes be sufficient for this?

JeffBezanson commented 10 years ago

+1 to @stevengj 's comments. Being able to evaluate code for interpolation during loading of a module is not a reasonable requirement. In many cases, code required for that will not even be available yet. You also might want to embed a plot in the docs for a package that doesn't itself require plotting.

MichaelHatherly commented 10 years ago

@JeffBezanson, yes, I'm beginning to agree with that reasoning more. The interpolation is fine for really simple gimmicky examples, but doesn't work for anything more complex. No need to over-complicate things for no gain.

MikeInnes commented 10 years ago

I don't think the argument was against interpolation per se, just against evaluating code while the module is loading. That's fine, you can evaluate docs just after the module is loaded and cache the output, or even evaluate the docs lazily as Steven has suggested. (I prefer the former option myself, since it means you only pay the price of evaluation once, rather than every time you load the doc string – but that's a technical detail).

JeffBezanson commented 10 years ago

Everything must be deferred as late as possible. Load times are already crushing us.

MikeInnes commented 10 years ago

Sure, but that's going to be solved by caching compiled code in 0.4, right? I completely agree that we need to optimise load time as far as 0.3 is concerned, I'm just suggesting that we might extend the caching idea to docs so that laziness isn't necessary in the first place.

It may be that this is an extra complication that isn't worth dealing with in this first iteration though, in which case laziness seems like a fine way to go.

JeffBezanson commented 10 years ago

Caching doesn't help during package development when you are repeatedly re-loading. We don't want to punish package developers with long load times for writing lots of documentation with excellent examples :)

MichaelHatherly commented 10 years ago

@JeffBezanson Agreed, just documenting a package shouldn't add to load times unnecessarily.

@one-more-minute I've done some pre-compilation trials with Docile, and everything appears to work fine. Until that's finished though, we'll have to be lazy I'd think.

MikeInnes commented 10 years ago

@JeffBezanson That's a good point. If we need laziness as an option even when we do have caching then it makes a lot of sense to start with it.

@MichaelHatherly Sure, that seems fair.

Ultimately Markdown.jl is agnostic to all this stuff anyway – it will just dump the code in the AST and give you the tools to evaluate and render as you want.

stevengj commented 10 years ago

@one-more-minute, "lazily" doesn't mean that you pay the price "every time you load the doc string". It means that you perform computation as late as possible—only when you actually need to use the results—but then you cache the result for subsequent use.

That is, the sequence I'm proposing is:

MikeInnes commented 10 years ago

I do see what you mean by loading lazily, but since you're only likely to check a doc string once or twice per session it amounts to the same thing. As soon as you reboot Julia, you have to pay the evaluation price again for each doc string you look at. This is the problem that more persistent caching would solve.

stevengj commented 10 years ago

I see your point, but maybe we should wait to see how things work out in practice. We can always add offline documentation caching later, if incorporating expensive dynamically executed examples in documentation proves popular. Maybe it's better to hold off on dynamically executed examples entirely for the time being...it adds a lot of complexity when Julia still needs basic docstrings.

And without dynamically executed examples, Markdown can be sent to the IJulia front end without parsing it at all (since Jupyter will render the raw text/markdown string for you).

MichaelHatherly commented 9 years ago

Closed, since both base and Docile have implemented.