Closed StefanKarpinski closed 9 years ago
cc: @johnmyleswhite, @lindahua, @ViralBShah
I'm pretty excited about this; I think it'll make documentation easier as a first pass, but being able to attach data to functions in general can be used for quite a few neat things. The first time I wanted something like this was when I was developing the codespeed infrastructure; I wanted to annotate functions with metadata stating the name of the test that function ran, what units the resultant metric of that test would be, (Time, FLOPS, bytes/clock cycle, etc....) whether "less is better" for that particular unit, etc..... So I think whatever we come up has the opportunity to be somewhat more than the only analogue I can think of right now (Python's docstrings), which is just a single string of data. We have the chance to make the data we attach highly structured, in the sense that it can be manipulated by other julia code.
regards to all .. imho ..
@StefanKarpinski writes this right
What we want to be able to associate with run-time objects like functions, methods, modules, and global bindings? It would be nice to have easy, queryable access to source code for things as well as inline comments associated with that source code. How to associate that data with run-time objects? While it may be reasonable to have this kind of overhead in interactive situations, we also must be able to run programs non-interactively without paying that price.
In any creatively powerful software paradigm, and so certainly with Julia, there is available a dynamism that at once allows a design to run well, go fast reliably, and harvest the deep accurately and at another affords development, investigation and playfulness the robust power and makes perspective, conception, and insight readily accessible as a newly realized design that runs well, goes fast and is reliably accurate.
As Stefan notes, it is entirely reasonable and sound that Julia offer the language user each modality's respective advantage; that is more compelling than a requirement that they operate in mutual simultaneity.
+1
To comment on 2) Interpreting and presenting this data to the world.
and more in relation with IPython notebook/qtconsole/console that now can be used with IJulia, I want to point out that we had the discussion in IPython of enabling "rich" docstring. So as you integrated multimedia io (#3932) into IJulia core, maybe you could have the possibility of help()
returning different mimetype for different frontend.
You are probably much more flexible in what you can do than us in IPython, and we will happily see what you come up with.
Be carefull though, with rich mimetype representation of documentation, doc may become a security issue (inject javascript in the notebook that can execute code in the kernel), but it can also be an advantage as you could also have executable or dynamic doc, like runable sample code. One thing we were not totally able to solve is how to have working cross-link in the live documentation in the notebook.
similar to #2508
This issue has been neglected for too long. Let me make a concrete proposal to get the ball rolling. A basic starting point could be:
DOC::DocDict <: Association{Any,Any}
.
Function
or Method
, although we'll also want to document other Julia objects.)writemime
machinery to convert this to various formats, e.g. reprmime("text/plain", DOC[x])
to get the text/plain
documentation of x
.On top of this machinery, various pieces could be added:
DOC[f::Function]
would look up the general documentation for f
, analogous to our help
now (we would still have a help
function, it will just use DOC
). DOC[m::Method]
would look up the documentation for a specific method signature. To get all of the documentation for a function f
, you would call [DOC[f], [DOC[m] for m in methods(f)]
.@doc foo: f = # equivalent to DOC[f] = foo, i.e. documentation for f independent of any method signature
@doc bar: function f(....) # equivalent to DOC[method signature for f(....)] = bar
....
end
DOC[foo] = bar
statements, appending to the documentation.@doc
and DOC[foo] = bar
do nothing, to eliminate any overhead of storing/updating DOC
in production code.The simplest documentation would be in the form of strings, for which only the text/plain
representation is available. However, we could define types to encapsulate higher-level information and formatted text. For example:
DocDefinition(doc::Any, file::String, line::Integer, source::String, ....timestamp?....other?....)
to store a documentation value doc
along with metadata for a definition in a source file. The @doc
macro could automatically use this wrapper type. One could define writemime(m::MIME, d::DocDefinition) = writemime(m, d.doc)
to make this wrapper transparent.Markdown(s::String)
which interprets its argument as markdown with embedded LaTeX equations, and defines writemime(::MIME"text/x-markdown, x::Markdown)
along with other output formats. So one would do e.g.@doc Markdown("""
.....
"""): foo(...) = ....
or there could be a @docmd
shortcut for this.
I like the simplicity of this approach.
@velicanu might be interested.
This is a really great idea.
This is interesting, I'll try to do it.
We also need some way of associating documentation with manual sections in a hierarchy (e.g. "Mathematical functions / Special functions / Bessel functions"). And in general we want a way to associate metadata with objects. One option, in line with the above proposal, would be to:
metadata/author
for author string, or metadata/section
for an @
delimited string of section names in descending order of specificity, e.g. "Bessel functions@Special functions@Mathematical functions
.DOC[x]
value type that wants to provide any metadata could define the appropriate writemime
function.@doc
macro could accept metadata as keyword-like arguments:@doc section="Documentation@Awesomeness" author="Alyssa P. Hacker" """ ..... docs .... """: somefunction(...) = ...
and would store them in a "metadata" Dict
inside DocDefinition
. mimewritable
for DocDefinition
would then return true
for metadata MIME types corresponding to keys in the metadata Dict
.
Some thought should go into the @doc
macro syntax to make the resulting code as human-readable as possible. One annoyance with using a macro for this is that you can't simply insert linebreaks wherever you want without breaking the parsing. But if this seems to be a problem I suppose that we could add a new keyword/syntax to Julia that parses as @doc
or some kind of document(expr, ...)
function call.
@loladiro, is there any missing functionality in the above proposal compared to what is needed to implement the REPL help
?
Note that importing a module would execute all of its embedded DOC[foo] = bar statements, appending to the documentation.
Have you considered dooing so only at install time for libs ? I'm especially thinking that for library. One would probably like to build the all html doc at once when the library is installed, because of cross-links and everything
you might need to build the doc for the all lib at once. Also, in notebook, we can probably have a link in the pager that open file://path/to/julia/doc/module/function.html
that is browsable (runnable ??) .
@Carreau, on top of this one can build various tools, e.g. a tool to import a module and build documentation in some format. As @StefanKarpinski said at the top of this thread, however, that is conceptually separate from the task of associating the data with the objects in the first place.
@stevengj Sorry I wasn't clear, I was not worried about the external tool to build the doc, I was wondering about associating externally this back to the objects. Like an external way to add value to DOC::DocDict <: Association{Any,Any}
but I guess you are right, this can be a layer on top of DOC.
I'm not sure what you mean by an "external way to add a value to DOC
" ... any Julia program will be able to mutate the DOC
contents.
I might have misunderstood something, and will re-read, but global dictionary-like object
made me though of a per-session object that dies with the interpreter, which can make sens in a interactive like environnement. This was comforted by the :
Note that importing a module would execute all of its embedded DOC[foo] = bar statements, appending to the documentation.
I was more thinking of a persisting database of those info (for example build at package installation time)
And at some point I can for example run a local html doc-build of JuMP that "register" with this database, so that when I do help(some-function-of-jump)
it knows how to access this.
The global dict DOC::DocDict
proposed by @stevengj doesn't seem quite right to me. Shouldn't globals be avoided if possible? Why not put that info directly into the modules, methods and functions themselves? For instance, add a field data
to the Function
type and similarly to the Methods
type. Let data
be a dict or contain a field data.doc
. That way help(fn)
could get to it and the hierarchy information is easily available too. Other data could put into that dict as well, like e.g. the source code or the annotations @staticfloat mentioned above.
What's missing is the possibility to associate data with globals. Either make all globals containers with a data
field too, or resort to a global dict for those.
What's wrong with a global in this context?
Base
, so it's not obvious that segregating the documentation for that method is desirable.Method
signature m
, m.func.code.module
gives the corresponding module (and given a module one can find the parent with module_parent
.) It would be easy to add module information to the DOC
dict for constants too if desired.Function
or Method
, e.g. constants, types, and perhaps macros. So adding fields to Function
and Method
is not sufficient, as you point out. And if you have a "global dict" for constants, it only adds complexity to have a completely separate data structure for function and method documentation.Function
adds runtime overhead. In something like Python this doesn't matter, but in Julia it is a big deal. You certainly don't want to slow down running code, and there should be a way to avoid storing the documentation entirely in production code.What is the concrete disadvantage of a global dictionary that overcomes its advantages in simplicity and functionality? Blanket prejudice against globals is not persuasive.
To me segregating the function/method metadata from the function is odd. There is plenty of (meta-)data already associated with methods/functions/modules (e.g. signature, module...), why treat the additional metadata differently? (See point 3 for the most important argument)
Say for instance, I have a method which is 'private' to my module, i.e. I don't export it. But I may still want to document it (for my own purpose) or I want to add other metadata like @staticfloat mentioned. Why should this metadata, which is private to a module, live in a global variable?
Comments to your points:
fn.data.doc
or DOC[fn]
. Also usually one would use help(fn)
or some other function which would work with either. Also, as I mentioned above, there is plenty data already associated with functions/modules/... so it must be possible to maintain such machinery.+ adds numbers
; and specialized doc refining on that, like +(a::Integer, b::Rational) adds a + b and returns a Rational
(a bit a stupid example).DOC[:sin]
should work and DOC[:(Base.sin)]
should work too. Do you define it twice, or is only one valid? What if I do sinalias = sin; DOC[:sinalias]
? What if two modules have a function of the same name? How is DOC
updated after a using
imports some names into the top level? All these namespace issues would come for free if the data was tacked onto the functions/methods/... This seems to me the most important argument against the dict.DataType
datatype. To annotate instances of a type, say pi
a convention could be to define a field like _doc
and put the documentation there. That leaves us with macros, not sure about those. Are they a type themselves? What are they? Well, either way, it will be good to have a way to associate metadata with functions etc., especially for docs.
Number 3 is actually not a problem, since you would use the function object itself etc. as the key: DOC[sin]
. This works just the same way with namespacing as storing metadata inside the objects. Either approach will have trouble with macros however, since there doesn't seem to be any actual macro object to use as a key, or store metadata in.
Argh, github though that my 3. was a 1. I was talking about number 3, anyway.
Yes, you're right. Out goes what I thought was the best argument. Still, why store the data in several places if it could be in one.
Regarding point 1, the problem is that we aren't just dealing with functions. You want a metadata lookup procedure that works equally well for all types, and is uniform for all types.
Regarding point 2, the separation of generic and method-specific documentation was already provided for in my proposal: the former is provided by DOC[f::Function]
and the latter by DOC[m::Method]
.
Regarding the "why store the data in several places if it could be in one" argument, that is a matter of perspective. I think of a single DOC
variable as "just one" place, whereas deciding type-by-type where to stick some field to store metadata seems like several places to me. (And if we forget to add a metadata field to some type, then no metadata is possible for instances of that type.)
Functionally both approaches should work.
What I don't see is why metadata should get treated differently from other data/metadata of function/types/modules. For instance, there is no global dict MOD
which stores for each function/type/instance/... the module where it was defined. Now you argue that metadata is different because it should be treated the same for all function/types/modules. But metadata may well not be uniform for all types, e.g.: storing return types would only make sense for functions, having a hierarchy of docs makes sense for functions/methods but probably not for modules and instances, storing the source code may not make sense for instances, modules may want to store only that part of the source which is not in the functions & types to avoid duplication, etc.
Of course, this non-uniformity could be implemented in the global dict as well. But I think because the metadata belongs to the function/types/modules/... and can be specific, that is where is should be tacked on. (I have no skills to implement this feature, so I cannot comment on specifics of the implementation. I have no frame of reference, really.)
I was hoping that julia would adopt support for metadata similar to Clojure where it can be defined for namespaced symbols and collections. Many powerful tools have been built on top of this including a full blown type checking system. Clojure's documentation system is also built upon metadata. However this does incur overhead (in memory and filling out the fields at runtime) so I can see that argument as well.
But why are we arguing about implementation details? Shouldn't we agree on an interfaces to get at specific metadata (doc(f::Function), doc(f::Module), etc.). How it's implemented can always change later.
I like the @docmd
idea.
I am personally interested in manual-style documentation, akin to Perl's =pod
feature. I like @stevengj's suggestion of keyword-like arguments. My current approach is to write my Julia files like this:
#=doc
# Product Manual
# --------------
# Here I insert Markdown documentation ...
#=end
... my code ...
With @docmd
and the metadata idea, I could write:
@docmd
section="Chapter 7"
author="Alyssa P. Hacker"
"""
How to Analyze Simulation Results
---------------------------------
Blah blah blah
""": somefunction(...) = ...
... my code ...
Here is an idea that I personally would love:
function suggestion()
"""@md
Suggestion
----------
Extend Python's doc-string syntax: Add an @ followed by a file extension
after the first triple quote. At the simplest level, anyone can write a
script that pulls out the triple-quoted strings and saves them in a file
with the correct extension. So I can document my code with @tex, @md, @rst,
@htm, @xml (e.g. docbook) or whatever. For example:
prompt $ juliadoc example.jl # Saves example.tex
prompt $ pdflatex example.tex # Saves example.pdf
Going a step further, you can talk about adding @md documentation to the
help() or extracting @html documentation for IJulia, or what-not.
Advantages that I see:
- The idea seems simple to me.
- It lets users adopt any (text) format that they deem useful.
- That includes formats that we might not think of today.
- Anyone can write a useful script for their documentation pipeline.
"""
...
end
Opinions?
@dcarrera, Julia already has string macros, so the Julian thing would be md"""......"""
rather than """@md ......"""
.
This should create a String
subtype that has a writemime(io, "text/markdown", s)
method for extracting markdown text, and also has writemime(io, "text/plain", s)
for a plain-text representation, and perhaps writemime(io, "text/html", s)
for HTML conversion. In the future we can define additional string types as needed.
I like calling these "macro strings" or "string macros" rather than "non-standard string literals", which was the best name I could come up when I was originally writing the manual.
Yeah. I remember seeing L" ... "
in PyPlot for LaTeX. I don't know a lot about those, so I'm going to ask some possibly naive questions:
1) One would have to pre-define macros for all the formats that people are likely to use (tex, html, xml, md, rst) (or the user could write their own) right?
2) How would I use writemime(io, "text/markdown", s)
? Would I read a an entire Julia program as text and feed it into writemime
or am I still responsible for extracting all the md" ... "
strings from the source?
3) In my example, you would have to remove the 4 spaces of indentation. Otherwise that could mess up white-space-significant formats like Markdown. The number of spaces depends on the indentation of the initial triple quote. Can string macros do this? Perhaps I'm trying to solve an unsolvable problem because people could use tabs for indentation and you can't know the tab-to-space ratio of their text editor.
All in all, the idea of string macros sounds good. I suppose one could start by making a JuliaDoc
module to experiment with the features before incorporating into Julia.
Regarding your questions, @dcarrera.
1) I'm proposing that all of the documentation tools operate generically on Julia objects and use writemime
to convert to other formats for output. So other classes could be added later as desired, including in user code; you certainly would not have to predefine all possible formats in advance.
2) When Julia code is loaded, the associated documentation objects would be stored in a datastructure of some sort (e.g. a DOC
dictionary in my proposal, and other tools (e.g. online help()
, documentation generators, etcetera) could process this datastructure as needed. You would not need to do any parsing of source code yourself. (Note that this issue is only about documenting Julia objects (functions, constants, etc.), although of course similar types could be used for other sorts of documentation.)
3) Triple-quoted strings automatically dedent for you (though this is not yet documented; see #5135).
1) Thanks.
2) Ok. I suppose that for that you can either have a Python-style rule where a tripe quoted string following function declaration is assumed to be documentation, or you could use the @doc
macro you proposed, maybe like this:
function foo()
@doc md"""
Markdown documentation here...
"""
I just created a new issue ( #5200 ) that is about manual/tutorial type documentation. So I don't pollute this issue with off-topic ideas.
3) I just tried the dedentation feature. It works well. If you try hard enough you can make "break" the Markdown, but it took an intentional effort on my part. I suspect it will work well in practice.
If the @doc
macro goes before function
(as I proposed) rather than after, then it can be implemented purely in Julia. Anything inside the function declaration would require changes to the parser. Also, it wouldn't go well with one-line f(x) = bar
functions.
Another thing that will cause some trouble are all the functions created by metaprogramming, e.g., https://github.com/JuliaLang/julia/blob/master/base/array.jl#L931-L994. Adding documentation strings to these functions in a sensible manner will require some thought.
@dcarrera @stevengj I added quotes to the @doc
s in your comments. A public service reminder to always quote your macro invocations in GitHub issues. Many of our macros are the same as GitHub usernames.
@stevengj: Ok. I didn't pick up on that. I actually like it better that way -- documentation before the function. I guess that the function would become a parameter for the macro, or something like that... Does that mean that if we want to also have POD-style documentation we'll need to create a different macro besides @doc
?
I have a question about macros. Going back to @stevengj's example:
@doc Markdown("""
.....
"""): foo(...) = ....
Is it possible to make the colon and the following function optional? So that @doc
could be used both for documenting functions and for writing manual-style documentation, and the way you know whether a doc string refers to a specific function or object is simply that it ends in a colon. For example:
@doc md"""
Product Manual
--------------
Blah blah blah"""
@doc md"""
This is how function foo() works...""":
function foo()
...
end
@dcarrera, yes, macros can do different things depending on the number of arguments, so I think that would be a reasonable re-use of @doc
for #5200. (And I'm not sure we want the colon anyway.)
I would like to present a different idea from what we've discussed so far:
1) Implement a useful subset of Asciidoc in Julia (easy for a subset).
2) Interpret @doc
strings as Asciidoc by default.
3) Use Asciidoc metadata instead of adding more options to @doc
Let me give you an example of what I mean:
@doc """
:Author: Daniel Carrera
:Email: <dcarrera@gmail.com>
:Date: 2 January 2014
:Revision: 3.2.3
Blah blah blah ... Asciidoc supports metadata.
""" function foo(x)
...
end
I have been thinking about this issue for the last several days. I think that some of the proposed features for the @doc
macro (author, section, etc) feel a bit like reinventing the wheel. At the same time, I have become impressed by Asciidoc ---it seems as easy or easier than Markdown, for the things Markdow can do; yet, it seems more complete than ReST---.
I would not try to implement 100% of Asciidoc in Julia. I simply do not see the need. People can use external tools if they want to write a book in Asciidoc. What I think would make sense is to pick a subset of Asciidoc that matches what we would like Julia's help system to have available.
An additional idea is to use Asciidoc labels or headings to make the keys of the DOC[]
object. For example:
@doc """
foo(x):: This is how function `foo` works by default.
This is another line.
foo(x::Integer):: Blah blah blah.
"""
This would allow you to separate documentation from the function declaration. Whether doing so is a good idea may depend on the context, but some times it might be a good idea. For example, Julia's current help system does this exactly, but using ReST.
Not reinventing the wheel sounds like a good idea, but I'd rather adopt the meta-data schema of Doxygen or gtk-doc,, which are precisely oriented towards this goal.
Why would we want a default format anyway? I think the better option is to define a standard interface for how the @doc foo"""my foo doc doc""" function
should work, and let it be up to the community to develop different formatting solutions, and let the solutions with the best tools win (and be included in Base/standard distribution). A user will probably be able to read and update documentation written in any reasonable format, so the diversity will not be a problem.
Core Julia will then be responsible for the @doc
macro, the global DOC
dictionary, some guidelines for the object in the dictionary and a simple plain string implementation. It might be reasonable to require it to respond to writemime
with MIME"text/html"
, MIME"text/plain"
and so on. If we want author/date/revision to be accessible we might have a Base.Doc
module where you can provide implementations for Doc.author()
, Doc.date()
and Doc.revision()
.
Just like there's a style guide, I think it would be better to recommend a documentation system to make collaboration easier. This would also allow the package system to check that the documentation is up-to-date, e.g. that a summary of what the function does is provided, and that all arguments and the return value are documented.
R provides such a system, and when the number of contributed packages gets large, it's very nice to have a way to enforce some degree of consistency and quality of the documentation -- or at least to provide a tool helping maintainers to check that their documentation is up to some standard.
As @StefanKarpinski said at the top above, the first thing is to decide how to associate data with Julia objects, in a way that allows many different kinds of data to be attached. Deciding on a standard format for documentation data is a somewhat separate issue (not completely independent, but it's important not to get too bogged down on the latter problem before we solve the former problem).
I have mixed feelings on diversity. I think that a default documentation system has a lot of value. My impression is that Perl, Python and Java have all benefited from their respective standards for documentation. I think @nalimilan raised some good points that I hadn't thought of.
I like Doxygen for the topic of this issue ("help" style documentation). I was hoping to use something that would also be useful for manual-style documentation without having to a different format for manuals.
@stevengj : For associating data with Julia objects, what's wrong with the global DOC
dictionary you proposed? That seems like a natural solution. I'm probably missing something, but it seems to me that most of difficulty is in the API (including data format), like what should the @doc
macro do? What should be the input to @doc
and what should @doc
do with that input? No?
@dcarrera, I don't think anything is wrong with my proposal. :-) But others have to agree and someone has to implement it.
Ok. Here is my attempt at a slightly more concrete proposal:
DOC
DOC
is a global dictionary object, where the keys are any object one wishes to document, and the value is any object that implements writemime
with at least the following MIME types:
writemime("meta/summary", DOC[f] )
writemime("meta/author", DOC[f] )
writemime("meta/date", DOC[f] )
writemime("text/plain", DOC[f] )
writemime("text/html", DOC[f] )
In addition, DOC[f].meta
must be an array listing all the metadata available for the object.
@doc
macroAnyone can write a macro for documentation, as long as it fills the DOC
object correctly, as indicated in Part I. Julia can come with a default @doc
macro. Personally, I might be warming to the idea of something based on Doxygen, but I need to think more. This provides a type of default, while allowing the freedom for people to document things differently without losing features provided by DOC
.
As an example, an @doc
macro inspired by Doxygen could look like this:
@doc """
One sentence summary of what the function does.
A longer description of what the function does.
This part can span multiple lines.
* Bullet.
* List.
* Etc.
@author Daniel Carrera
@param ...
@param ...
@return ...
""" function foo()
...
end
Same example again, now using AsciiDoc:
@doc """
:author: Daniel Carrera
:summary: One sentence summary of what the function does.
:param: ...
:param: ...
:return: ...
The rest of the docstring is a more detailed description
of the function. Everything in the docstring is processed
by some http://www.asciidoc.org[AsciiDoc] parser.
* Include.
* Bullet.
* Lists.
== Level 2 heading
[options="header,footer"]
|=======================
|Col 1|Col 2 |Col 3
|1 |Item 1 |a
|2 |Item 2 |b
|3 |Item 3 |c
|6 |Three items|d
|=======================
== Another level 2 heading
""" function foo()
...
end
NOTE: This post was edited from the original version.
@dcarrera, I think there is some value in specifying a difference between a DOC[f::Function]
(generic documentation for all methods of a function) and DOC[m::Method]
(documentation specific to a particular method signature).
Also, I'm not sure I like the DOC[f].meta
pattern, since .
cannot be overloaded. I would suggest instead that:
DOC[f] <: Associative{Symbol,String}
. DOC[f][foo]
gives the metadata (String
) for the symbol foo
. e.g. DOC[f][:author]
is an author string.keys(DOC[f])
gives an iterator over the metadata keys as usual.keys(DOC[f])
may be empty), although :summary
at least is recommended. And we standardize a few (optional) metadata names like :author
, :date
, and so on.Then we wouldn't use writemime
for "meta/foo" metadata faux MIME types. Instead, we would only use it for outputting the documentation itself, requiring only text/plain and text/html.
I prefer Markdown to asciidoc, since:
There are a number of issues discussing documentation for Julia code (#762, #1619, #3407), but I'd like to separate this problem into two very distinct issues:
We keep getting bogged down in the combination of these two issues, but they can be tackled separately, and should, imo, remain decoupled – that is, the infrastructure for (1) should be reusable with different approaches to interpreting comments and different mechanisms for presenting documentation (help, sphinx, dexy, jocco, etc.).
This issue is for discussion of (1):
Let's solve this first and then figure out how to interpret and present things.