Closed stevengj closed 9 years ago
I've go a @tex_mstr
macro to avoid the escaping problem. md
, rst
, etc macros would be a good plan. Another way of going about the formatting would be to specify it in the metadata such as:
@doc """
""" { :format => "md" } ->
myfunc(x) = x
though that's perhaps a bit of a pain to have to write. Setting a module-level choice of formatting
via the @docstrings
macro might be a way around that.
I've recently (in the last few days) moved the formatting code to Lexicon.jl, which isn't currently registered. This makes Docile format agnostic I hope. Docile is just storing the docstrings as raw strings now, previously I was parsing to Markdown.jl ASTs at module load time.
I think it's much cleaner for the object to know its format (and output via writemime
etcetera) than to use extra metadata.
Then, as in my original proposal in JuliaLang/julia#3988, Docile wouldn't store strings at all:
writemime
etc. to output in various formats. Every object supports text/plain
output. Markdown
objects (created by md"...."
) would also have text/markdown
output, and probably text/html
output as well. And so on.@doc "...."
it will just store the string "..."
, which only has text/plain
output. If you want markdown you need to use md"..."
to construct a Markdown
object.md"..."
would definitely be much cleaner. I'm not sure if there's a way to interpolate into string macros though, which is a feature I'd like to keep.
Would the Markdown
objects just wrap the string, ie
type Markdown
content::String
end
or would the content be converted into an AST when a module is loaded?
Docile 0.1.0
parsed all the docstrings into Markdown.Block
s. This led to a slow down in the load time of modules I was testing on since Docile had to import Markdown.jl and then parse each string as well. This won't be a problem once pre-compilation is available, but for now it's a bit annoying.
0.2.0
now just captures the strings and leaves the parsing and rendering business to Lexicon. This has made loading modules that use @doc
quite a bit faster (just anecdotal evidence though, I've not got any numbers regarding this).
Wrapping the strings in Markdown
objects rather than adding extra metadata during the @doc
-stage and then implementing all the writemime
methods in Lexicon might be the way forward?
@MichaelHatherly, there is no way to interpolate into string macros (this is the tradeoff for no escaping); if you need that (rarely, I think?), you would need to do Markdown("....")
or similar.
I agree that it would make the most sense to store it as a String
. Parsing the AST would only be necessary if you wanted to convert it to, say, HTML, or display it in formatted form in the REPL with terminal colors etc, and since Jupyter will shortly support the direct display of text/markdown, this kind of conversion should rarely be necessary, so it can be done only when needed.
writemime
to text/plain and text/markdown require no parsing, so these could be implemented in Docile. I agree that it makes sense to put writemime
methods for other output formats into a separate package.
You can do string interpolation in a string macro, but you have to implement it yourself. You get the raw string, and do your replacements. You can return expressions that reads local variables just like a normal "insert $var here"
is seen as
Expr
head: Symbol string
args: Array(Any,(3,))
1: ASCIIString "insert "
2: Symbol var
3: ASCIIString " here"
typ: Any
in the AST.
Markdown("...")
should do fine. The interpolation is rare, I've used it once or twice — mostly just to test it out. I'll have a look at implementing interpolation @ivarne. I presume that Julia's interpolation code is here?
I suspect having Markdown
etc, be subtypes of something like an abstract Docstring
might be the right way to structure this?
abstract Docstring
type Markdown <: Docstring
content::String
end
# ...
Or is there already an abstract type in Base that could be subtyped instead?
I think we might want abstract Docstring
in Base at some point, but that can be added later when there is actually a proven need for it.
Yes, that makes sense.
I don't think they should be subtypes of anything (other than Any
). Documentation doesn't even need to be a string. It just has to be some object with some writemime
method to a format that you want.
@ivarne, I think in this context we wouldn't want to implement "manual" interpolation. The reason is that the lack of interpolation here is a feature: it let's you easily write documentation examples that contain interpolation (meant to be printed literally as $...
, not actually interpolated by the documentation system).
That is a valid concern that I support. Programmable documentation should probably be a little less surprising. I just said it was possible with string macros, and how, not that it was desirable.
Perhaps an explicit esc
flag could be used if interpolation is desired:
@doc md"""
1 + 1 = $(1 + 1)
"""esc ->
f(x) = x
Not such a high priority though.
Closed by #35.
I should mention that, although I haven't implemented it yet, I was going to have something like an md"..."
macro anyway. Having to reimplement interpolation isn't a bug, it's a feature – since we can do things like embedding plots and other data in the Markdown AST and rendering them as appropriate.
I don't really agree with disallowing plain strings as documentation – why not just default to the common case (i.e. treat the string as if it were md"..."
)?
Regardless, Docile looks like it's coming along really nicely. Soon I should have time to give Markdown.jl a much-needed overhaul so that it's ready for things like this.
I've got basic interpolation going, it's in interpolate.jl. Docile's defaulting to treating strings as markdown, and that can be set using the @docstrings
metadata Dict
:
@docstrings [ :format => MarkdownDocstring ]
Subject to naming changes.
Since there's only a markdown parser currently that setting's kind of redundant, but having a default has been much nicer than changing all the docstrings I've got to md"..."
s.
There's a few quirks I've come across in Markdown.jl, which I'll post as issues tomorrow sometime.
Sure, go ahead, I'll definitely try and fix any problems. More generally I want to aim for the new Common Markdown standard, which should iron out a lot of quirks.
Are those interpolate.jl functions parsers in the Markdown.jl sense? In principle you should be able to inject the $
trigger into Markdown.jl and parse interpolation without any internal changes. If not that's fine too, it won't be hugely difficult to knock something up.
Yes, aiming for the standard would be great.
I wasn't aware of the Markdown.jl parser interface, so there're probably not going to fit straight in.
I'm not sold on the need for interpolation here; it seems like an unnecessary complication. In the rare cases where it is needed, it seems like one could just use MarkdownString("....")
with normal Julia interpolation (and manually escaping any $
that is not intended to be interpolated).
To avoid having to type MarkdownString
when wanting interpolation, using a default format specified via @docstrings [:format => MarkdownString]
seems to be a good compromise to me.
No need for the extra complication of interpolation or calling MarkdownString
for most of one's docstrings if that happens to be the necessary behaviour required for a project.
What does :format => foo
do, in general? Call foo(x)
on every documentation object x
? Only in the current module? What if x
is not a string? What if x
is html"...."
or some other rich-text object already?
In what project would most of the docstrings require pervasive interpolation? My comment was predicated on the guess that interpolation in docstrings will be rare. (If the project wants to add some boilerplate to the end of every docstring, wouldn't it be better for the project to define a custom string macro mymd"..."
that appends the boilerplate?)
It just seems cleaner and more flexible to me to let each documentation object x
be constructed using standard Julia syntax, with no global state applying hidden transformations.
:format => Format
just specifies what function (in this case Format
) to apply to raw strings that are used as documentation. Any string macros such as html"..."
or md"..."
aren't touched at all by this and are just stored in the Entry
object created by @doc
as whatever object the string macro returns.
Since @docstrings
creates a Documentation
object, called __METADATA__
, in the current module the effects of :format
are limited to docstrings stored in that Documentation
object in the current module. I'm not sure if this would be classed as global scope or not though.
Interpolation could be used to create REPL-like code examples (such as python docstrings sometimes have) with results without having to write the results into the string itself:
@doc md"""
**Examples:**
```julia
julia> [1, 2, 3, 4]
$(stringmime("text/plain", [1, 2, 3, 4]))
...
"""
After writing that out though, perhaps there's a better solution to that particular problem.
My biggest problem is having to specify what kind of documentation I'm writing for every single docstring. It is only 2 characters, but if I can just chose a default and not have to worry about it after that I'd rather do that. I'd guess that for most packages docstrings will be written in just one format, rather than a mixture.
That's exactly the right idea – we want to write things like
@doc """
Returns a random walk, which will look something like:
$(plot(y=randwalk(100), x=1:100))
* `n`: The number of samples in the walk.
""" ->
randwalk(n) = cumsum(rand(n))
The idea is to embed that plot
object directly in the AST so that you can render it to HTML, terminal etc. as appropriate. This is essential – using videos, images etc. as documentation is only a gimmick if you can't mix them with text and other information.
However, note that for this to work you can't just apply a function to the string, you have to let the string macro do its work at compile time. The @doc
macro will have to take the raw string and transform it into an @md_str "..."
expression. This could still be customisable, of course – you just need to choose a string prefix like md
to use by default rather than an arbitrary function.
That said, I think we can worry about the technical details once the md
macro is actually implemented and working – I'll try and get that done soon.
@one-more-minute, that's a good example.
However, I think that interpolation is probably the wrong model here, since I don't think you want to execute the interpolated code (e.g. the plot
command) when the documentation object is created. Instead, you want to execute the code lazily, when the documentation is actually needed, e.g. when the Markdown AST is created (lazily, as discussed above). This is important for two reasons: (i) performance, since you don't want to make it really slow to load code that has documentation with embedded code, (ii) dependency order: if you are loading a module, the code needed to run the plot
command might not actually run until the rest of the module is loaded.
So, it might be better to have a custom markdown extension giving a syntax for embedded Julia code to be evaluated and substituted when the AST is parsed. But then it won't rely on interpolation. And you probably don't want to use $(...)
because that will conflict with code examples using interpolation.
Regarding a custom extension for embedded julia code, would fenced_code_attributes be sufficient for this?
+1 to @stevengj 's comments. Being able to evaluate code for interpolation during loading of a module is not a reasonable requirement. In many cases, code required for that will not even be available yet. You also might want to embed a plot in the docs for a package that doesn't itself require plotting.
@JeffBezanson, yes, I'm beginning to agree with that reasoning more. The interpolation is fine for really simple gimmicky examples, but doesn't work for anything more complex. No need to over-complicate things for no gain.
I don't think the argument was against interpolation per se, just against evaluating code while the module is loading. That's fine, you can evaluate docs just after the module is loaded and cache the output, or even evaluate the docs lazily as Steven has suggested. (I prefer the former option myself, since it means you only pay the price of evaluation once, rather than every time you load the doc string – but that's a technical detail).
Everything must be deferred as late as possible. Load times are already crushing us.
Sure, but that's going to be solved by caching compiled code in 0.4, right? I completely agree that we need to optimise load time as far as 0.3 is concerned, I'm just suggesting that we might extend the caching idea to docs so that laziness isn't necessary in the first place.
It may be that this is an extra complication that isn't worth dealing with in this first iteration though, in which case laziness seems like a fine way to go.
Caching doesn't help during package development when you are repeatedly re-loading. We don't want to punish package developers with long load times for writing lots of documentation with excellent examples :)
@JeffBezanson Agreed, just documenting a package shouldn't add to load times unnecessarily.
@one-more-minute I've done some pre-compilation trials with Docile, and everything appears to work fine. Until that's finished though, we'll have to be lazy I'd think.
@JeffBezanson That's a good point. If we need laziness as an option even when we do have caching then it makes a lot of sense to start with it.
@MichaelHatherly Sure, that seems fair.
Ultimately Markdown.jl is agnostic to all this stuff anyway – it will just dump the code in the AST and give you the tools to evaluate and render as you want.
@one-more-minute, "lazily" doesn't mean that you pay the price "every time you load the doc string". It means that you perform computation as late as possible—only when you actually need to use the results—but then you cache the result for subsequent use.
That is, the sequence I'm proposing is:
md"..."
string is stored in a MarkdownString
object as an unparsed string. The MarkdownString
also contains an ast
field which is initially undefined.text/html
) by writemime(io, "text/html", doc)
or similar. At this point, the markdown string is parsed, and the AST is generated and stored in the ast
field. At this time, embedded Julia code (syntax to be determined) is also executed, and the results (text/images/etc) are substituted into the AST.ast
field (isdefined(mdstring, :ast)
will return true
).I do see what you mean by loading lazily, but since you're only likely to check a doc string once or twice per session it amounts to the same thing. As soon as you reboot Julia, you have to pay the evaluation price again for each doc string you look at. This is the problem that more persistent caching would solve.
I see your point, but maybe we should wait to see how things work out in practice. We can always add offline documentation caching later, if incorporating expensive dynamically executed examples in documentation proves popular. Maybe it's better to hold off on dynamically executed examples entirely for the time being...it adds a lot of complexity when Julia still needs basic docstrings.
And without dynamically executed examples, Markdown can be sent to the IJulia front end without parsing it at all (since Jupyter will render the raw text/markdown
string for you).
Closed, since both base and Docile have implemented.
Using an explicit prefix, via a string macro, has two advantages:
$
or\n
etcetera.The second point is more of a tradeoff, I suppose, but typing two extra characters seems a small price to pay. The first point is extremely compelling to me.
("..." strings could default to text/plain, or even be disallowed.)