Closed quinnj closed 9 years ago
@StefanKarpinski, I just did a little benchmark. include
ing a file defining 10,000 random string constants took 1.47 seconds, while include
ing a file defining 10,000 random Symbol=>String
dictionaries took 3.16 seconds. This difference doesn't seem that substantial to me, especially since most modules will define much less metadata than this.
2x slower is huge considering the the massive amount of work you have to go through to make the frontend 2x faster.
@jakebolewski, losing a factor of 2 in something that only takes 1% of the total loading time for a module means losing a factor of 1%.
That being said, having to put a separate placeholder in an offline "manual outline" for each function that you want to appear there does not seem terrible to me.
Good documentation combines introductory and transitional material with bits of reference. An outline document is a good place for the introductory and transitional material, and it can simply splice in doc strings in the appropriate places. That way the details remain up-to-date and near the definitions of the functions being described, while the code isn't cluttered with lots of prose, and it isn't forced to be concerned with the structure of the documentation, which often doesn't match the structure of the code.
So basically, I'm proposing this for the overall documentation:
I don't understand how you'd "easily" plug in doc strings. It seems easier indicate sections next to the methods; how do I indicate which function/method I want to plug in, in the outline-thing? (vs. writing the outline/intro material and then plugging in the whole section of docstrings at once.)
On Tue, Sep 30, 2014 at 10:48 AM, Stefan Karpinski <notifications@github.com
wrote:
So basically, I'm proposing this for the overall documentation:
- doc strings provide bits of markdown-flavored reference material associated with specific objects and symbols
- there is no structure and no metadata for doc strings beyond this – they can be queried and displayed in the REPL or in other interfaces like IJulia and Juno, but they're just organized like a key-value store.
- high-level documentation is written in the same markdown-flavored format, but it can easily splice in bits of doc string so that the material stays in sync and doesn't have to be repeated.
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/8514#issuecomment-57335256.
So basically, I'm proposing...
Totally on board for 1 and 3. I would still prefer to support in-docstring metadata, but I see it as non-essential, and could easily be added later. One issue with splicing in docstrings is that adding a new method necessitates changing the documentation in two places (docstring, and external docs). Whereas metadata would allow defining groups of docstrings to splice in together.
I don't understand how you'd "easily" plug in doc strings.
I could imagine using a template system like mustache. So external docs would look like:
# Frob and related functions.
The frob operation has a long and storied history, which I'll describe in great detail here...
{{frob}}
Yes, just what @dcjones said. I think some amount of metadata in doc strings makes sense – just enough so that you can query them by keywords or something. That could easily be satisfied by having a convention that writing keywords: frizzle, frobnication
at the bottom of a doc string allows the string to be retrieved by keyword queries. Making this mechanism fully programmable strikes me as asking for abuse and overcomplication.
Does {{frob}}
mean the function-level documentation or all documentation?
What if I only want one method of frob
?
On Tue, Sep 30, 2014 at 11:16 AM, Stefan Karpinski <notifications@github.com
wrote:
Yes, just what @dcjones https://github.com/dcjones said. I think some amount of metadata in doc strings makes sense – just enough so that you can query them by keywords or something. That could easily be satisfied by having a convention that writing keywords: frizzle, frobnication at the bottom of a doc string allows the string to be retrieved by keyword queries. Making this mechanism fully programmable strikes me as asking for abuse and overcomplication.
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/8514#issuecomment-57339587.
I can tolerate omitting generic metadata, but I would strongly prefer using custom string (or other object) types to indicate format, with writemime
for output.
md"..."
instead of "..."
is only two extra characters, which is a small price to pay for not locking ourselves into Markdown for the next 20 years.$
or \
in code samples or LaTeX equations. LaTeX equations are basically unusable without a string macro to suppress escaping.Base can ship with only md"..."
(MarkdownString
) (and also text/plain strings, of course), which will serve the purpose of encouraging everyone to use the same format.
(Though as soon as you start thinking about adding custom markdown extensions like keywords, I think the issue of metadata should be revisited. If you think that creating metadata dicts will slow down loading, wait until you actually try parsing the docstrings at load time. If you separate the metadata and the docstrings, each docstring need only be parsed when it is displayed. Dicts are way more flexible, will arguably be easier to implement because they use the existing parser, and don't require custom markup flavors.)
I am extremely hawkish about load time but I'm not really worried about the slowdown from metadata dicts on docstrings, for reasons that have been discussed already: (1) they're not all that slow, (2) not all docstrings will have them, (3) they can be shared among docstrings.
My main concern is getting something simple working first so we can have help and docs for packages ASAP. After that there are concerns about complexity and where various information should be stored, but we can continue to discuss that while enjoying the availability of package help :)
+1 to having something that works for packages asap.
Yes, +1 to having something vaguely like what's been discussed in this thread soon. I'm happy to adjust Docile to match whatever makes it into Base so that 0.3 packages can have documentation too.
I agree that we should get something asap, with the caveat that major flaws and disagreements should be things that are resolvable later without much breakage.
Adding documentation metadata is something that can be done later without breakage, because most docstrings won't have metadata so we will want an optional syntax anyway.
Changing "..."
to md"..."
if you want markdown-syntax docstrings will be a painful breakage to impose later.
Regarding the "..."
vs md"..."
change, if the default is markdown, then changing later is a matter of making markdown the default and allowing other formats optionally. It strikes me as weird to indicate the flavor of markup on a per-doc-string basis. Are you going to use lots of different markups in a single file or even a single project? I'm really not convinced that we'll ever need more than one.
@StefanKarpinski, note that we'll need a string macro anyway in order to easily use LaTeX equations in Markdown (otherwise you have to backslash like crazy).
That would be true if we couldn't change the parser ;-)
I would prefer format-agnostic documentation (requiring only writemime).
I don’t think “getting something out fast” is affected by which of these we chose. Making something work for special Julia Markdown strings only vs. an equivalent MarkdownString type doesn’t seem like a big difference as far as implementation effort.
Forcing everyone to use the same format seems unfortunate. I agree with having a strong default (i.e. shipping and using only one format in base), but choosing not to support any other format is actively preventing anyone from ever using a different format. There is always some dissent about formats, and if someone strongly prefers rst for their project (for the toolchain, or whatever), then there’s no reason to actively prevent them from doing so.
An example of using different types of documentation in one package: some documentation might be in a separate file, so those functions would just like to refer to the file path & have the file actually read lazily. This could be accomplished with a different type (FileDocString or whatever) that behaves appropriately.
Allowing user-defined documentation formats would also allow users to define their own extensions to Julia Markdown -- and try them out without forcing them on anyone else or needing to modify the Julia parser.
FWIW I'm in violent agreement with @stevengj w.r.t allowing whatever system we end up with to store arbitrary metadata, not just strings. My impression is that the clojure community (e.g.) has benefited tremendously from this and built some really cool stuff (core.typed anyone?) on top of it, and it seems uncharacteristically restrictive (for what I see as the "Julian" attitude about this sort of thing) to not allow it.
What @porterjamesj said! Just learned about how Clojure does this: http://en.wikibooks.org/wiki/Learning_Clojure/Meta_Data - very neat! IMO a good implementation would make documentation a special case of a general mechanism to attach metadata to certain kinds of objects. (at least under the hood while providing sufficient syntactic sugar.)
ref: #3988
IMO a good implementation would make documentation a special case of a general mechanism to attach metadata to certain kinds of objects. (at least under the hood while providing sufficient syntactic sugar.)
which, unless I'm mistaken, is exactly what @stevengj has been arguing for.
I like the idea of having "..." / """..."""be Julia's default Markdown, whatever flavor that is, so we and our tools don't have to think very hard about how to deal with basic comments.
I'd also like to see provision, even if just a placeholder for now, to add flexible metadata. Although most docs right now are either plain text or rich text, there are plenty of areas where a picture or equation would really help, and with tools like IJulia and Juno we already have much of the infrastructure required to serve rich help.
Note also that if we support attaching an arbitrary "documentation" object with output via writemime
, then including dictionaries of metadata can be implemented on top of this. e.g. you can have a MetaDoc
type that wraps the "actual" documentation object plus a Symbol=>Any
dictionary of other metadata:
doc MetaDoc(md"My documentation...", [:author=>"SGJ", :status=>"buggy"])
foo(...) = ...
(Where, as I mentioned above, we probably need an optional doc
keyword for any documentation object that is not a string literal or string macro.)
FWIW, I find @stevengj's suggestion really compelling. It seems much easier to make an initial pass that's very vague about what "should" go in a MetaDoc object and flesh it out, than to take a stricter rule about strings and later replace it with MetaDoc objects.
That would be true if we couldn't change the parser ;-)
I'll just note as a minor point that using some kind of clue, like md
or doc
or whatever, would make things much simpler for editors' and IDE's highlighting, for properly displaying special characters, LaTeX etc., since we can't really expect editors to implement full-blown parsers. Maybe that could be mitigated by using "a string at global scope is documentation" as a proxy rule, but I suspect that could turn out messy.
We already have a concept that for creating new syntactic elements, and they are called macros and string macros. Having different rules for "escaping $variables and \newLatexFunctions in string literals"
in different contexts would be inconsistent, making Julia more confusing.
I'd argue that two extra letters to type for Markdown parsing isn't a big problem. If you use markdown for formatting your documentation, you'll probably have a multiline doc, and two characters seem like a small annoyance.
I agree that it is poor style to mix different documentation formats in a single file, it might sometimes be useful. That way you can gradually change format in a file without having to fix all the issues at once. Usually design discussions in Julia has not been won by the argument "someone is going to use this feature to write horrible unreadable code".
I'd argue that two extra letters to type for Markdown parsing isn't a big problem. If you use markdown for formatting your documentation, you'll probably have a multiline doc, and two characters seem like a small annoyance.
I have to disagree with this. Firstly, a lot of docstrings are likely to look like
"`push!(object, x)`: Append x to the object."
i.e. not multiline.
That said, it's not really about the two character overhead. The fact is that most people will use the most the most convenient documentation form available, so defaulting to plain docstrings amounts to endorsing them.
I'm all for supporting richer formats (tex""
etc.) but supporting both plain and rich docs doesn't make much sense – markdown opens a lot of opportunities (nice presentation, syntax highlighting, structured information etc.) without making things more cumbersome, so we should encourage people to use it over plain text as much as possible. Treating "..."
docstrings as md"..."
is a very simple and effective way to do that.
we should encourage people to use it over plain text as much as possible. Treating "..." docstrings as md"..." is a very simple and effective way to do that.
This is a good idea. One of the problems with Python docstrings is that they are plain text, and you can't get people to use anything else unless it's endorsed by the language implementation. TIMTOWTDI leads to everyone using the lowest common denominator, i.e. plain text. Unambiguously going with one default markup language in Julia makes it better. Markdown is a good choice, especially as IJulia is the de facto "more than plaintext" display environment for Julia.
Putting myself in the loop to make sure Lint can check through doc string correctly.
I think that rather the problem with Python docstrings is that there is no standard way of specifying the format. That means that when you aggregate documentation from docstrings, you have to guess the format, and computers are bad at guessing, so the feature is little used.
@one-more-minute Maybe that is a valid case, but if I want to save characters to type I'll rather not have to repeat the signature inside the docstring, but have it automatically captured from the actual signature on the next line.
By the way, another reason to support (a) plain-text strings and (b) non-literal documentation strings is importing help from other languages.
e.g. in PyPlot I define various functions which are wrappers around Python functions, and I want their help to be automatically imported from the Python docstring (which is plain text). If we have a doc
keyword (or @doc
) that supports arbitrary Julia expressions, and allows plain-text strings, this will be easy:
const bar_py = pyplot["bar"]
doc convert(String, bar_py["__doc__"])
function bar(...)
end
Note also that if you make the doc
macro automatically interpret string literals as Markdown, but which interprets string-valued expressions as plain text, then you will get different results for:
doc "*foo*" foo(x) = ...
# versus:
const foodoc = "*foo*"
doc foodoc foo(x) = ...
Whereas if you interpret "..."
consistently as a plain-text string, and require md"..."
for Markdown, the behavior is a bit more comprehensible.
Just checking in here to see if we have something usable to start with. Are we still waiting on Markdown.jl
?
Markdown.jl is already in that other PR (which is good to go as far as I'm concerned, though I'm happy to make any changes if I've missed anything of course).
Oh yeah we can totally close this
Some good discussion started here.
This is to more formerly track integrating the necessary parts into Base since it seems some good consensus is building.
@one-more-minute @MichaelHatherly