MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.29k stars 646 forks source link

write all base class API definition docs inside the class #919

Open kain88-de opened 8 years ago

kain88-de commented 8 years ago

Update 07-Aug-2016

Base Classes that follow the new guide lines

In #892 I noticed that we actually document the default API behavior of base classes like Reader and AnalysisBase. I actually never noticed this because the docs for the classes are in the __init__.py file and not where the class is defined.

This is an issue for me because when I change the API I will very likely forget to update the docs. For the very simple reason that it's not in the same file and I will only look for docs in files that actually define a class. I'm also sure I would overlook this in a PR.

In #892 I went a different way (because I didn't know any better) and used an example piece of code to show the API and an actual analysis can use AnalysisBase. Here the definition of the API lives together with the code. Something I like because it makes it harder for me to forget changing it.

So we have two methods for documentation right now

  1. document Base Classes API in __init__
  2. document Base Classes API in baseclass docstring.

Of course this is a objective issue and different people prefer different styles. I'm OK with both methods (I prefer 2 because I'm forgetful). I just want to know what other people think about this.

orbeckst commented 8 years ago

tl;dr I agree that there's need for discussion but I don't have a good answer yet.

For a start, I don't think that __init__.py files are a great place for extensive docs. We/I started out that way because, well, it was around the time when we started using sphinx and generated docs and had rather limited experience and idea how it would grow. For example, the Trajectory API was pretty central and one needed a way to find this information in one place. help(MDAnalysis.coordinates) was pretty much the way to get at these docs. So it made sense to have it at the package level.

With the reorganization that went on it would now make sense to either move it to coordinates.base (at the module level) or alternatively put the individual components with base.Timestep and base.ProtoReader (?). However, we would need some document that explains how these different parts hang together – something that one can link to as " Trajectory API" . And if we do docs at the class level, should the primary reader docs live with base.Reader or base.ProtoReader?

Sorry, I veered off the Analysis API docs using the Trajectory API as an example to highlight some of the considerations that I think would have to be made.

In an ideal world (and what other projects like numpy or datreant have), we'd have the API reference, which is essentially the doc strings, and user docs that link to the API docs. We would also have developer docs, which would contain the high level picture (eg overview over Trajectory API and Analysis API). These docs would be written as reST files under doc/sphinx. The module level doc strings could then be reduced to the bare essentials.

Some issues related to the larger theme of structuring the docs:

kain88-de commented 8 years ago

After thinking a little bit about it here is my suggestions

  1. All __init__.py docs now go into the function docstrings
  2. The class doc-string needs to contain a list of function that can be overwritten. Separated into required functions and optional functions
  3. the class doc-string should contain a minimal example how to derive this class to show of best practices.

Here is an example for the AnalysisBase class. I don't have much experience in rst markup, so the markup could be improved upon.

class AnalysisBase(object):
    """Base class for defining multi frame analysis, it is designed as a
    template for creating multiframe analysis. This class will automatically
    take care of setting up the trajectory reader for iterating and offers to
    show a progress meter.

    * Functions required to implement *
    - ``_single_frame``

    * Functions optional to implement *
    - ``_prepare``
    - ``_conclude``

    Here is a minimal example how to subclass AnalysisBase

    .. code-block:: python
       class NewAnalysis(AnalysisBase):
           def __init__(self, atomgroup, parameter, **kwargs):
               super(NewAnalysis, self).__init__(atomgroup.universe.trajectory,
                                                 **kwargs)
               self._parameter = parameter
               self._ag = atomgroup

           def _prepare(self):
               self.result = []

           def _single_frame(self):
               self.result.append(some_function(self._ag, self._parameter))

           def _conclude(self):
               self.result = np.asarray(self.result) / np.sum(self.result)

    Afterwards the new analysis can be run like this.

    .. code-block:: python
       na = NewAnalysis(u.select_atoms('name CA'), 35).run()
       print(na.result)

    """
    def __init__(self, quiet=True):
        """
        Parameters
        ----------
        quiet : bool, optional
            Turn off verbosity
        """
        self._quiet = quiet

    def _single_frame(self):
        """*REQUIRED*
        Calculate data from a single frame of trajectory. Store result of
        calculation in an appropriate variable.
        """
        raise NotImplementedError("Only implemented in child classes")

    def _prepare(self):
        """*OPTIONAL*
           Set things up before the analysis loop begins. Here you should
           create/reset variables where analysis results are stored
        """
        pass

    def _conclude(self):
        """*OPTIONAL*
        Called at the end of the run() method to finish everything up. Finalise
        the results you've gathered, eg do a normalization or averaging.
        """
        pass

    def run(self):
        """Perform the calculation"""
        # left out for brevity of example
jbarnoud commented 8 years ago

The main problem is that we have 3 types of users, each of them having different documentation needs:

So far we try to adress the 3 types of users at the same time. This translate by the doc being mostly written for the hacky user since he needs everything to be documented. Some parts of the doc try to split the levels of documentation, but the split is not natural and often quite clumsy. This is usually when you have the doc in __init__.py.

Al of this to say that I agree which @orbeckst ideal word, even if it means a lot of work. In the mean time, having the doc in __init__.py allows to have an overview of the different elements. When we have base.py, core.py, and one file per file format, __init__.py is the only place in the code where you can connect all the pieces together. An other place is in the sphinx pages in MDAnalysis/package/doc/sphinx/source/documentation_pages, but in my experience nobody look at them.

On a more general standpoint, the documentation needs love. We are careful that everything gets a docstring, but we pay very little attention to how they are updated, and we hide the big picture deep in the module references.

kain88-de commented 8 years ago

I'm here only talking about users 2 and 3. As a BaseClass API is only valuable for them. Documentation for user 1 should always be in the docstrings of the actual implementation of a class like analysis.pca.PCA, analysis.contacts.Contacts.

I like my example because it directly gives a short list of important functions and a copy pastable example that can be used as a starting point for advanced and hacky users. All information about the BaseClass API is in one place that is easier to see and remember in a PR and when changing it myself. If we need to document other parts we can add a NOTE to the docs.

About the trajectory Reader example. I think it is a bad example because the API and class system is not very nice right now and unnecessarily complex (Basically everything inherits from Reader anyway). But even though all the base classes that tie things together are in one file and we can create docs for them naming the individual components that then link to class specific docs like scipy.optimize, which I think is what @orbeckst has in mind.

Also for a normal user I would prefer it help(mda.coordinates) gives some general examples how all Readers in MDAnalysis can be used by a normal user. Docs about implementing your own Reader should only be linked to. But this is a more general discussion as I anticipated here.

richardjgowers commented 8 years ago

I think @jbarnoud is right about different users, and I would think commands like help(this) and this? in ipython (are the same?) should cater for group 1 only.

One option I like is having two sets of docs, one for group 1 which is a minimal how to use things, and then a separate set for hacking. These docs for example have a button top right that switches between the two

http://plumed.github.io/doc-v2.0/user-doc/html/index.html

I'm not sure how difficult it would be for each module to have a light/heavy version of each page...

richardjgowers commented 8 years ago

So to clarify, in a perfect world with a perfect doc system, it'd be nice if I could do...

"""
Some docs for users

This is how you use my module

@start for devs
Some more details on the workings

This is how you hack my module
@end for devs
"""

And sphinx/whatever would magically make the for devs sections expandable sections that could be skipped.

https://pythonhosted.org/cloud_sptheme/cloud_theme_test.html#toggleable-section

Maybe this is usable..

kain88-de commented 8 years ago

OK I think we are messing things up here. Any ...Base class docs are not interesting for a user. They are purely for development. In my question I'm concerned with the base class docs only. Normally a base class is unusable and will throw exceptions when one tries to use it. This means all docs for a base class can be in the base class and according to your wishes shouldn't be in the __init__.py or module level docs string.

More comprehensive example of doc separation

This example shows how to have short module docstrings that (hopefully) do a correct rst link to the implementation of classes and show the user docs first.

analysis/__init__.py

"""
This is a module containing various predefined analysis classes. 

# List of Analysis Modules

:mod: `~MDAnalysis.analysis.align`
   description

:mod: `~MDAnalysis.analysis.contacts`
   description

# Developing your own analysis
See `analysis.base.AnalysisBase` (should be a iink)
"""
__all__ = ['align', 'contacts']

analysis/base.py

"""
This module contains all helper classes and functions to define your own
analysis.
"""

class AnalysisBase(object):
    """Base class for defining multi frame analysis, it is designed as a
    template for creating multiframe analysis. This class will automatically
    take care of setting up the trajectory reader for iterating and offers to
    show a progress meter.

    * Functions required to implement *
    - ``_single_frame``

    * Functions optional to implement *
    - ``_prepare``
    - ``_conclude``

    Here is a minimal example how to subclass AnalysisBase

    .. code-block:: python
       class NewAnalysis(AnalysisBase):
           def __init__(self, atomgroup, parameter, **kwargs):
               super(NewAnalysis, self).__init__(atomgroup.universe.trajectory,
                                                 **kwargs)
               self._parameter = parameter
               self._ag = atomgroup

           def _prepare(self):
               self.result = []

           def _single_frame(self):
               self.result.append(some_function(self._ag, self._parameter))

           def _conclude(self):
               self.result = np.asarray(self.result) / np.sum(self.result)

    Afterwards the new analysis can be run like this.

    .. code-block:: python
       na = NewAnalysis(u.select_atoms('name CA'), 35).run()
       print(na.result)

    """
    def __init__(self, quiet=True):
        """
        Parameters
        ----------
        quiet : bool, optional
            Turn off verbosity
        """
        self._quiet = quiet

    def _single_frame(self):
        """*REQUIRED*
        Calculate data from a single frame of trajectory. Store result of
        calculation in an appropriate variable.
        """
        raise NotImplementedError("Only implemented in child classes")

    def _prepare(self):
        """*OPTIONAL*
           Set things up before the analysis loop begins. Here you should
           create/reset variables where analysis results are stored
        """
        pass

    def _conclude(self):
        """*OPTIONAL*
        Called at the end of the run() method to finish everything up. Finalise
        the results you've gathered, eg do a normalization or averaging.
        """
        pass

analysis/contacts.py

"""
This module contains contact analysis class.

- :mod:`Contacts`
"""

class Contacts(AnalysisBase):
    """
    here we should show some simple usage of the class and attributes and method lists
    """
    def __init__(self, universe, **kwargs):
        """
        Docs for initialization of Contacts
        """
        super(Contacts, self).__init__(universe.trajectory, **kwargs)

How this will look in ipython

orbeckst commented 8 years ago

Good discussion about the wider issues, but as @kain88-de pointed out very constructively, let's come back to the issue at hand.

Overall, @kain88-de 's example https://github.com/MDAnalysis/mdanalysis/issues/919#issuecomment-236558733 looks like a good path forward for the developer docs (for simplicity I'll just lump groups 2 and 3 and core devs all into one group... apologies). Given that developers are the best to write developer docs and given that we want to make it easy on the developers to write docs, we should go with the "developer docs stay close to the code". @kain88-de 's proposal looks like a sane way forward for doing that. So 👍 from me.

For the user docs someone will have to sit down and write readable, almost "how-to" style pages. They will ultimately link into the API docs but they have to be written for users. I find that the reST sphinx docs are the best medium for those kind of docs. (Notebooks are nice for posting and for outlining how-tos but I haven't seen a good way to integrate them into docs, e.g. making hyperlinks to the API docs work or CSS styling in line with the rest of the docs.) But the user docs are a separate issue.

Just one thing: If you clean up docs to move towards the developer documentation style and there are still "user-style" docs at the top of the module, please do not just delete them but instead copy them to the appropriate sphinx reST file. Even if some of the docs look crappy, in most cases any docs is better than no docs, and writing docs is always work... don't destroy that work light-heartedly.

kain88-de commented 8 years ago

If you clean up docs to move towards the developer documentation style and there are still "user-style" docs at the top of the module

Sorry if it didn't come out right. I want to move those docs into the class doc string. That's where they are best and where IPythons ? operator works best. Judging from myself I always do some.module.function? in IPython to find out how a module is supposed to be used. They can first live in the module and we gradually move them around. This would be the next milestone for a 1.0 release after python 3 compatibility.

orbeckst commented 8 years ago

No, I understood you but thanks for the clarification.

I was thinking along the lines of the current align or hole docs, which have more of a tutorial style flavor and don't fit nicely within one class. These are the kind of docs that I would want to move out of the code and into sphinx reST.

(Just as an aside: even though Notebooks are not my favorite for docs, one can build rather nice Cookbooks with them, as evidenced, for example by the scikit-bio CookBook; see also MDAnalysis/MDAnalysisCookbook#1).)

orbeckst commented 8 years ago

This is esoteric and can be skipped by everyone:

Regarding @richardjgowers 's https://github.com/MDAnalysis/mdanalysis/issues/919#issuecomment-236546932 that help(func) and func? are the same: Almost but not quite. The difference is only important when you dynamically change __doc__ strings. help reads code whereas ipython's ? appears to evaluate. See Becksteinlab/GromacsWrapper#74 for more...

richardjgowers commented 8 years ago

Ok yeah, I like @kain88-de 's outline above. I think it's very true that some docs are not interesting for most users, so as long as we have Group 1's docs first and then dev docs below

orbeckst commented 8 years ago

I think we can remove the question tag and consider @kain88-de 's proposal https://github.com/MDAnalysis/mdanalysis/issues/919#issuecomment-236458622 as accepted, i.e.:

Document Base class APIs in the class doc string.

  1. The class doc-string needs to contain a list of methods that can be overwritten by inheritance from the base class. Distinguish and document methods as required or optional.
  2. The class doc-string should contain a minimal example for how to derive this class. (This demonstrates best practices, documents ideas and intentions behind the specific choices in the API, helps to promote a unified code base and is useful for developers as a concise summary of the API).

(Current API docs in __init__.py should be migrated as much as possible.)

orbeckst commented 8 years ago

@kain88-de – add a note to the style guide and then close this issue.

kain88-de commented 8 years ago

https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#writing-docs-for-abstract-base-classes

updated the style guide

kain88-de commented 8 years ago

I updated the first post and we keep this issue open to track progress in conversion of the docs.

orbeckst commented 8 years ago

@kain88-de I am assigning you as at least one person to keep track of this issue. Feel free to draft other folks.

I am also updating the title.