JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.64k stars 5.48k forks source link

associating data with functions, modules, and globals #3988

Closed StefanKarpinski closed 9 years ago

StefanKarpinski commented 11 years ago

There are a number of issues discussing documentation for Julia code (#762, #1619, #3407), but I'd like to separate this problem into two very distinct issues:

  1. Associating text from source files – both comments and source code – with functions, methods, modules, and global bindings.
  2. Interpreting and presenting this data to the world.

We keep getting bogged down in the combination of these two issues, but they can be tackled separately, and should, imo, remain decoupled – that is, the infrastructure for (1) should be reusable with different approaches to interpreting comments and different mechanisms for presenting documentation (help, sphinx, dexy, jocco, etc.).

This issue is for discussion of (1):

Let's solve this first and then figure out how to interpret and present things.

stevengj commented 11 years ago

cc: @johnmyleswhite, @lindahua, @ViralBShah

staticfloat commented 11 years ago

I'm pretty excited about this; I think it'll make documentation easier as a first pass, but being able to attach data to functions in general can be used for quite a few neat things. The first time I wanted something like this was when I was developing the codespeed infrastructure; I wanted to annotate functions with metadata stating the name of the test that function ran, what units the resultant metric of that test would be, (Time, FLOPS, bytes/clock cycle, etc....) whether "less is better" for that particular unit, etc..... So I think whatever we come up has the opportunity to be somewhat more than the only analogue I can think of right now (Python's docstrings), which is just a single string of data. We have the chance to make the data we attach highly structured, in the sense that it can be manipulated by other julia code.

JeffreySarnoff commented 11 years ago

regards to all .. imho ..

@StefanKarpinski writes this right

What we want to be able to associate with run-time objects like functions, methods, modules, and global bindings? It would be nice to have easy, queryable access to source code for things as well as inline comments associated with that source code. How to associate that data with run-time objects? While it may be reasonable to have this kind of overhead in interactive situations, we also must be able to run programs non-interactively without paying that price.

In any creatively powerful software paradigm, and so certainly with Julia, there is available a dynamism that at once allows a design to run well, go fast reliably, and harvest the deep accurately and at another affords development, investigation and playfulness the robust power and makes perspective, conception, and insight readily accessible as a newly realized design that runs well, goes fast and is reliably accurate.

As Stefan notes, it is entirely reasonable and sound that Julia offer the language user each modality's respective advantage; that is more compelling than a requirement that they operate in mutual simultaneity.

mlubin commented 11 years ago

+1

Carreau commented 11 years ago

To comment on 2) Interpreting and presenting this data to the world. and more in relation with IPython notebook/qtconsole/console that now can be used with IJulia, I want to point out that we had the discussion in IPython of enabling "rich" docstring. So as you integrated multimedia io (#3932) into IJulia core, maybe you could have the possibility of help() returning different mimetype for different frontend.

You are probably much more flexible in what you can do than us in IPython, and we will happily see what you come up with.

Be carefull though, with rich mimetype representation of documentation, doc may become a security issue (inject javascript in the notebook that can execute code in the kernel), but it can also be an advantage as you could also have executable or dynamic doc, like runable sample code. One thing we were not totally able to solve is how to have working cross-link in the live documentation in the notebook.

JeffBezanson commented 10 years ago

similar to #2508

stevengj commented 10 years ago

This issue has been neglected for too long. Let me make a concrete proposal to get the ball rolling. A basic starting point could be:

On top of this machinery, various pieces could be added:

@doc foo: f =   # equivalent to DOC[f] = foo, i.e. documentation for f independent of any method signature
@doc bar: function f(....)  # equivalent to DOC[method signature for f(....)] = bar
     ....
end

The simplest documentation would be in the form of strings, for which only the text/plain representation is available. However, we could define types to encapsulate higher-level information and formatted text. For example:

@doc Markdown("""
.....
"""): foo(...) = ....

or there could be a @docmd shortcut for this.

JeffBezanson commented 10 years ago

I like the simplicity of this approach.

@velicanu might be interested.

johnmyleswhite commented 10 years ago

This is a really great idea.

velicanu commented 10 years ago

This is interesting, I'll try to do it.

stevengj commented 10 years ago

We also need some way of associating documentation with manual sections in a hierarchy (e.g. "Mathematical functions / Special functions / Bessel functions"). And in general we want a way to associate metadata with objects. One option, in line with the above proposal, would be to:

@doc section="Documentation@Awesomeness" author="Alyssa P. Hacker" """ ..... docs .... """: somefunction(...) = ...

and would store them in a "metadata" Dict inside DocDefinition. mimewritable for DocDefinition would then return true for metadata MIME types corresponding to keys in the metadata Dict.

stevengj commented 10 years ago

Some thought should go into the @doc macro syntax to make the resulting code as human-readable as possible. One annoyance with using a macro for this is that you can't simply insert linebreaks wherever you want without breaking the parsing. But if this seems to be a problem I suppose that we could add a new keyword/syntax to Julia that parses as @doc or some kind of document(expr, ...) function call.

stevengj commented 10 years ago

@loladiro, is there any missing functionality in the above proposal compared to what is needed to implement the REPL help?

Carreau commented 10 years ago

Note that importing a module would execute all of its embedded DOC[foo] = bar statements, appending to the documentation.

Have you considered dooing so only at install time for libs ? I'm especially thinking that for library. One would probably like to build the all html doc at once when the library is installed, because of cross-links and everything you might need to build the doc for the all lib at once. Also, in notebook, we can probably have a link in the pager that open file://path/to/julia/doc/module/function.html that is browsable (runnable ??) .

stevengj commented 10 years ago

@Carreau, on top of this one can build various tools, e.g. a tool to import a module and build documentation in some format. As @StefanKarpinski said at the top of this thread, however, that is conceptually separate from the task of associating the data with the objects in the first place.

Carreau commented 10 years ago

@stevengj Sorry I wasn't clear, I was not worried about the external tool to build the doc, I was wondering about associating externally this back to the objects. Like an external way to add value to DOC::DocDict <: Association{Any,Any} but I guess you are right, this can be a layer on top of DOC.

stevengj commented 10 years ago

I'm not sure what you mean by an "external way to add a value to DOC" ... any Julia program will be able to mutate the DOC contents.

Carreau commented 10 years ago

I might have misunderstood something, and will re-read, but global dictionary-like object made me though of a per-session object that dies with the interpreter, which can make sens in a interactive like environnement. This was comforted by the :

Note that importing a module would execute all of its embedded DOC[foo] = bar statements, appending to the documentation.

I was more thinking of a persisting database of those info (for example build at package installation time) And at some point I can for example run a local html doc-build of JuMP that "register" with this database, so that when I do help(some-function-of-jump) it knows how to access this.

mauro3 commented 10 years ago

The global dict DOC::DocDict proposed by @stevengj doesn't seem quite right to me. Shouldn't globals be avoided if possible? Why not put that info directly into the modules, methods and functions themselves? For instance, add a field data to the Function type and similarly to the Methods type. Let data be a dict or contain a field data.doc. That way help(fn) could get to it and the hierarchy information is easily available too. Other data could put into that dict as well, like e.g. the source code or the annotations @staticfloat mentioned above.

What's missing is the possibility to associate data with globals. Either make all globals containers with a data field too, or resort to a global dict for those.

stevengj commented 10 years ago

What's wrong with a global in this context?

What is the concrete disadvantage of a global dictionary that overcomes its advantages in simplicity and functionality? Blanket prejudice against globals is not persuasive.

mauro3 commented 10 years ago

To me segregating the function/method metadata from the function is odd. There is plenty of (meta-)data already associated with methods/functions/modules (e.g. signature, module...), why treat the additional metadata differently? (See point 3 for the most important argument)

Say for instance, I have a method which is 'private' to my module, i.e. I don't export it. But I may still want to document it (for my own purpose) or I want to add other metadata like @staticfloat mentioned. Why should this metadata, which is private to a module, live in a global variable?

Comments to your points:

  1. There is not much difference in looking up/modifying things using fn.data.doc or DOC[fn]. Also usually one would use help(fn) or some other function which would work with either. Also, as I mentioned above, there is plenty data already associated with functions/modules/... so it must be possible to maintain such machinery.
  2. when extending a function with another method then that method is still contained in the generic function. So there is no segregation. Also, documentation writing will have to take into account multiple dispatch. I imagine that there should be some generic doc for the function, like + adds numbers; and specialized doc refining on that, like +(a::Integer, b::Rational) adds a + b and returns a Rational (a bit a stupid example).
  3. I think it would be awkward to get the namespacing right with a dict. Examples: DOC[:sin] should work and DOC[:(Base.sin)] should work too. Do you define it twice, or is only one valid? What if I do sinalias = sin; DOC[:sinalias]? What if two modules have a function of the same name? How is DOC updated after a using imports some names into the top level? All these namespace issues would come for free if the data was tacked onto the functions/methods/... This seems to me the most important argument against the dict.
  4. I think for type-metadata it would also be fine to add another field to the DataType datatype. To annotate instances of a type, say pi a convention could be to define a field like _doc and put the documentation there. That leaves us with macros, not sure about those. Are they a type themselves? What are they?
  5. I can't comment too much on performance. But I think in either approach it should be possible to tell the parser to fill the dict/field, if in the REPL, or not, if not interactive. Also, one could make the metadata immutable, that should help.

Well, either way, it will be good to have a way to associate metadata with functions etc., especially for docs.

toivoh commented 10 years ago

Number 3 is actually not a problem, since you would use the function object itself etc. as the key: DOC[sin]. This works just the same way with namespacing as storing metadata inside the objects. Either approach will have trouble with macros however, since there doesn't seem to be any actual macro object to use as a key, or store metadata in.

toivoh commented 10 years ago

Argh, github though that my 3. was a 1. I was talking about number 3, anyway.

mauro3 commented 10 years ago

Yes, you're right. Out goes what I thought was the best argument. Still, why store the data in several places if it could be in one.

stevengj commented 10 years ago

Regarding point 1, the problem is that we aren't just dealing with functions. You want a metadata lookup procedure that works equally well for all types, and is uniform for all types.

Regarding point 2, the separation of generic and method-specific documentation was already provided for in my proposal: the former is provided by DOC[f::Function] and the latter by DOC[m::Method].

stevengj commented 10 years ago

Regarding the "why store the data in several places if it could be in one" argument, that is a matter of perspective. I think of a single DOC variable as "just one" place, whereas deciding type-by-type where to stick some field to store metadata seems like several places to me. (And if we forget to add a metadata field to some type, then no metadata is possible for instances of that type.)

mauro3 commented 10 years ago

Functionally both approaches should work.

What I don't see is why metadata should get treated differently from other data/metadata of function/types/modules. For instance, there is no global dict MOD which stores for each function/type/instance/... the module where it was defined. Now you argue that metadata is different because it should be treated the same for all function/types/modules. But metadata may well not be uniform for all types, e.g.: storing return types would only make sense for functions, having a hierarchy of docs makes sense for functions/methods but probably not for modules and instances, storing the source code may not make sense for instances, modules may want to store only that part of the source which is not in the functions & types to avoid duplication, etc.

Of course, this non-uniformity could be implemented in the global dict as well. But I think because the metadata belongs to the function/types/modules/... and can be specific, that is where is should be tacked on. (I have no skills to implement this feature, so I cannot comment on specifics of the implementation. I have no frame of reference, really.)

jakebolewski commented 10 years ago

I was hoping that julia would adopt support for metadata similar to Clojure where it can be defined for namespaced symbols and collections. Many powerful tools have been built on top of this including a full blown type checking system. Clojure's documentation system is also built upon metadata. However this does incur overhead (in memory and filling out the fields at runtime) so I can see that argument as well.

But why are we arguing about implementation details? Shouldn't we agree on an interfaces to get at specific metadata (doc(f::Function), doc(f::Module), etc.). How it's implemented can always change later.

dcarrera commented 10 years ago

I like the @docmd idea.

I am personally interested in manual-style documentation, akin to Perl's =pod feature. I like @stevengj's suggestion of keyword-like arguments. My current approach is to write my Julia files like this:

#=doc
# Product Manual
# --------------
# Here I insert Markdown documentation ...
#=end
... my code ...

With @docmd and the metadata idea, I could write:

@docmd
section="Chapter 7"
author="Alyssa P. Hacker"
"""
How to Analyze Simulation Results
---------------------------------
Blah blah blah
""": somefunction(...) = ...

... my code ...
dcarrera commented 10 years ago

Here is an idea that I personally would love:

function suggestion()
    """@md
    Suggestion
    ----------
    Extend Python's doc-string syntax: Add an @ followed by a file extension
    after the first triple quote. At the simplest level, anyone can write a 
    script that pulls out the triple-quoted strings and saves them in a file
    with the correct extension. So I can document my code with @tex, @md, @rst,
    @htm, @xml (e.g. docbook) or whatever. For example:

        prompt $ juliadoc example.jl   #  Saves example.tex
        prompt $ pdflatex example.tex  #  Saves example.pdf

    Going a step further, you can talk about adding @md documentation to the
    help() or extracting @html documentation for IJulia, or what-not.

    Advantages that I see:

    - The idea seems simple to me.

    - It lets users adopt any (text) format that they deem useful.

    - That includes formats that we might not think of today.

    - Anyone can write a useful script for their documentation pipeline.
    """
    ...
end

Opinions?

stevengj commented 10 years ago

@dcarrera, Julia already has string macros, so the Julian thing would be md"""......""" rather than """@md ......""".

This should create a String subtype that has a writemime(io, "text/markdown", s) method for extracting markdown text, and also has writemime(io, "text/plain", s) for a plain-text representation, and perhaps writemime(io, "text/html", s) for HTML conversion. In the future we can define additional string types as needed.

StefanKarpinski commented 10 years ago

I like calling these "macro strings" or "string macros" rather than "non-standard string literals", which was the best name I could come up when I was originally writing the manual.

dcarrera commented 10 years ago

Yeah. I remember seeing L" ... " in PyPlot for LaTeX. I don't know a lot about those, so I'm going to ask some possibly naive questions:

1) One would have to pre-define macros for all the formats that people are likely to use (tex, html, xml, md, rst) (or the user could write their own) right?

2) How would I use writemime(io, "text/markdown", s) ? Would I read a an entire Julia program as text and feed it into writemime or am I still responsible for extracting all the md" ... " strings from the source?

3) In my example, you would have to remove the 4 spaces of indentation. Otherwise that could mess up white-space-significant formats like Markdown. The number of spaces depends on the indentation of the initial triple quote. Can string macros do this? Perhaps I'm trying to solve an unsolvable problem because people could use tabs for indentation and you can't know the tab-to-space ratio of their text editor.

All in all, the idea of string macros sounds good. I suppose one could start by making a JuliaDoc module to experiment with the features before incorporating into Julia.

stevengj commented 10 years ago

Regarding your questions, @dcarrera.

1) I'm proposing that all of the documentation tools operate generically on Julia objects and use writemime to convert to other formats for output. So other classes could be added later as desired, including in user code; you certainly would not have to predefine all possible formats in advance.

2) When Julia code is loaded, the associated documentation objects would be stored in a datastructure of some sort (e.g. a DOC dictionary in my proposal, and other tools (e.g. online help(), documentation generators, etcetera) could process this datastructure as needed. You would not need to do any parsing of source code yourself. (Note that this issue is only about documenting Julia objects (functions, constants, etc.), although of course similar types could be used for other sorts of documentation.)

3) Triple-quoted strings automatically dedent for you (though this is not yet documented; see #5135).

dcarrera commented 10 years ago

1) Thanks.

2) Ok. I suppose that for that you can either have a Python-style rule where a tripe quoted string following function declaration is assumed to be documentation, or you could use the @doc macro you proposed, maybe like this:

function foo()
    @doc md"""
    Markdown documentation here...
    """

I just created a new issue ( #5200 ) that is about manual/tutorial type documentation. So I don't pollute this issue with off-topic ideas.

3) I just tried the dedentation feature. It works well. If you try hard enough you can make "break" the Markdown, but it took an intentional effort on my part. I suspect it will work well in practice.

stevengj commented 10 years ago

If the @doc macro goes before function (as I proposed) rather than after, then it can be implemented purely in Julia. Anything inside the function declaration would require changes to the parser. Also, it wouldn't go well with one-line f(x) = bar functions.

mbauman commented 10 years ago

Another thing that will cause some trouble are all the functions created by metaprogramming, e.g., https://github.com/JuliaLang/julia/blob/master/base/array.jl#L931-L994. Adding documentation strings to these functions in a sensible manner will require some thought.

pao commented 10 years ago

@dcarrera @stevengj I added quotes to the @docs in your comments. A public service reminder to always quote your macro invocations in GitHub issues. Many of our macros are the same as GitHub usernames.

dcarrera commented 10 years ago

@stevengj: Ok. I didn't pick up on that. I actually like it better that way -- documentation before the function. I guess that the function would become a parameter for the macro, or something like that... Does that mean that if we want to also have POD-style documentation we'll need to create a different macro besides @doc?

dcarrera commented 10 years ago

I have a question about macros. Going back to @stevengj's example:

@doc Markdown("""
.....
"""): foo(...) = ....

Is it possible to make the colon and the following function optional? So that @doc could be used both for documenting functions and for writing manual-style documentation, and the way you know whether a doc string refers to a specific function or object is simply that it ends in a colon. For example:

@doc md"""
Product Manual
--------------
Blah blah blah"""

@doc md"""
This is how function foo() works...""":
function foo()
   ...
end
stevengj commented 10 years ago

@dcarrera, yes, macros can do different things depending on the number of arguments, so I think that would be a reasonable re-use of @doc for #5200. (And I'm not sure we want the colon anyway.)

dcarrera commented 10 years ago

I would like to present a different idea from what we've discussed so far:

1) Implement a useful subset of Asciidoc in Julia (easy for a subset). 2) Interpret @doc strings as Asciidoc by default. 3) Use Asciidoc metadata instead of adding more options to @doc

Let me give you an example of what I mean:

@doc """
:Author:    Daniel Carrera
:Email:     <dcarrera@gmail.com>
:Date:      2 January 2014
:Revision:  3.2.3

Blah blah blah ... Asciidoc supports metadata.
""" function foo(x)
   ...
end

I have been thinking about this issue for the last several days. I think that some of the proposed features for the @doc macro (author, section, etc) feel a bit like reinventing the wheel. At the same time, I have become impressed by Asciidoc ---it seems as easy or easier than Markdown, for the things Markdow can do; yet, it seems more complete than ReST---.

I would not try to implement 100% of Asciidoc in Julia. I simply do not see the need. People can use external tools if they want to write a book in Asciidoc. What I think would make sense is to pick a subset of Asciidoc that matches what we would like Julia's help system to have available.

An additional idea is to use Asciidoc labels or headings to make the keys of the DOC[] object. For example:

@doc """
foo(x)::  This is how function `foo` works by default.
          This is another line.

foo(x::Integer)::  Blah blah blah.
"""

This would allow you to separate documentation from the function declaration. Whether doing so is a good idea may depend on the context, but some times it might be a good idea. For example, Julia's current help system does this exactly, but using ReST.

nalimilan commented 10 years ago

Not reinventing the wheel sounds like a good idea, but I'd rather adopt the meta-data schema of Doxygen or gtk-doc,, which are precisely oriented towards this goal.

ivarne commented 10 years ago

Why would we want a default format anyway? I think the better option is to define a standard interface for how the @doc foo"""my foo doc doc""" function should work, and let it be up to the community to develop different formatting solutions, and let the solutions with the best tools win (and be included in Base/standard distribution). A user will probably be able to read and update documentation written in any reasonable format, so the diversity will not be a problem.

Core Julia will then be responsible for the @doc macro, the global DOC dictionary, some guidelines for the object in the dictionary and a simple plain string implementation. It might be reasonable to require it to respond to writemime with MIME"text/html", MIME"text/plain" and so on. If we want author/date/revision to be accessible we might have a Base.Doc module where you can provide implementations for Doc.author(), Doc.date() and Doc.revision().

nalimilan commented 10 years ago

Just like there's a style guide, I think it would be better to recommend a documentation system to make collaboration easier. This would also allow the package system to check that the documentation is up-to-date, e.g. that a summary of what the function does is provided, and that all arguments and the return value are documented.

R provides such a system, and when the number of contributed packages gets large, it's very nice to have a way to enforce some degree of consistency and quality of the documentation -- or at least to provide a tool helping maintainers to check that their documentation is up to some standard.

stevengj commented 10 years ago

As @StefanKarpinski said at the top above, the first thing is to decide how to associate data with Julia objects, in a way that allows many different kinds of data to be attached. Deciding on a standard format for documentation data is a somewhat separate issue (not completely independent, but it's important not to get too bogged down on the latter problem before we solve the former problem).

dcarrera commented 10 years ago

I have mixed feelings on diversity. I think that a default documentation system has a lot of value. My impression is that Perl, Python and Java have all benefited from their respective standards for documentation. I think @nalimilan raised some good points that I hadn't thought of.

I like Doxygen for the topic of this issue ("help" style documentation). I was hoping to use something that would also be useful for manual-style documentation without having to a different format for manuals.

@stevengj : For associating data with Julia objects, what's wrong with the global DOC dictionary you proposed? That seems like a natural solution. I'm probably missing something, but it seems to me that most of difficulty is in the API (including data format), like what should the @doc macro do? What should be the input to @doc and what should @doc do with that input? No?

stevengj commented 10 years ago

@dcarrera, I don't think anything is wrong with my proposal. :-) But others have to agree and someone has to implement it.

dcarrera commented 10 years ago

Ok. Here is my attempt at a slightly more concrete proposal:

Part I -- Definition of DOC

DOC is a global dictionary object, where the keys are any object one wishes to document, and the value is any object that implements writemime with at least the following MIME types:

writemime("meta/summary", DOC[f] )
writemime("meta/author", DOC[f] )
writemime("meta/date", DOC[f] )
writemime("text/plain", DOC[f] )
writemime("text/html", DOC[f] )

In addition, DOC[f].meta must be an array listing all the metadata available for the object.

Part II - @doc macro

Anyone can write a macro for documentation, as long as it fills the DOC object correctly, as indicated in Part I. Julia can come with a default @doc macro. Personally, I might be warming to the idea of something based on Doxygen, but I need to think more. This provides a type of default, while allowing the freedom for people to document things differently without losing features provided by DOC.

As an example, an @doc macro inspired by Doxygen could look like this:

@doc """
One sentence summary of what the function does.

A longer description of what the function does.
This part can span multiple lines.

* Bullet.
* List.
* Etc.

@author Daniel Carrera
@param  ...
@param  ...
@return ...
""" function foo()
    ...
end

Same example again, now using AsciiDoc:

@doc """
:author: Daniel Carrera
:summary: One sentence summary of what the function does.
:param:  ...
:param:  ...
:return: ...

The rest of the docstring is a more detailed description
of the function. Everything in the docstring is processed
by some http://www.asciidoc.org[AsciiDoc] parser.

* Include.
* Bullet.
* Lists.

== Level 2 heading

[options="header,footer"]
|=======================
|Col 1|Col 2      |Col 3
|1    |Item 1     |a
|2    |Item 2     |b
|3    |Item 3     |c
|6    |Three items|d
|=======================

== Another level 2 heading

""" function foo()
    ...
end

NOTE: This post was edited from the original version.

stevengj commented 10 years ago

@dcarrera, I think there is some value in specifying a difference between a DOC[f::Function] (generic documentation for all methods of a function) and DOC[m::Method] (documentation specific to a particular method signature).

Also, I'm not sure I like the DOC[f].meta pattern, since . cannot be overloaded. I would suggest instead that:

Then we wouldn't use writemime for "meta/foo" metadata faux MIME types. Instead, we would only use it for outputting the documentation itself, requiring only text/plain and text/html.

I prefer Markdown to asciidoc, since: