apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.5k stars 3.53k forks source link

[Doc][Python] The use of IPython directive or doctest code blocks in the python user guide #28859

Open asfimport opened 3 years ago

asfimport commented 3 years ago

From https://github.com/apache/arrow/pull/10266#discussion_r630837422

We are currently using the IPython directive in many places in the Python docs, so that something written as


.. ipython:: python

  x = 1
  x + 2

is converted during the doc build to (by running the code):


.. code-block:: ipython

  In [1]: x = 1 

  In [2]: x + 1
  Out[2]: 2

Running all the code during the doc build can be costly, and the more docs we add, the slower building the docs becomes.

We could convert all those to code-block, but personally I think ideally we still check the code examples for correctness, where applicable. For this, we could also use the doctest format instead of the IPython directive, and verify the docs using pytest doctests support.

This can be run separate as tests, and doesn't need to be part of doc building (at least when you only change wording / rst syntax, and want to verify the resulting html, you don't need to run the doctests).

But maintaining examples as doctests also certainly adds some extra cost (eg when outputs change slightly)

Another option could also be to add an option to the IPython directive to skip the execution of the code examples (I think this should be rather easy to add to the IPython directive, but then it's still a matter of passing this through from the build command invocation).

cc @pitrou @amol-

Reporter: Joris Van den Bossche / @jorisvandenbossche

Related issues:

Note: This issue was originally created as ARROW-13159. Please see the migration documentation for further details.

asfimport commented 3 years ago

Weston Pace / @westonpace: Have we measured the time these add?  It seems it would depend on the script.  I tried with code-block vs ipython in dataset.rst and didn't see any noticeable difference.

asfimport commented 3 years ago

Alessandro Molina / @amol-: There has been a similar discussion on the Apache Arrow Cookbook initiative too . In general, I tend to prefer the doctest directive to the ipython one for a few reasons:

  1. It tends to decouple "building the docs" and "testing the docs". Those are two clearly separated concerns and commands. When building the doctest directives are in no way different from a standard code-block. They don't introduce any slow down, nor need the code to actually work. This allows a faster development cycle where the author of the documentation can write the docs and immediately see the formatting/output and separatedly focus on the actual codeblocks verification
  2. While ipython does support output verification using the @doctest directive decorator, it seem a bit of an afterthought. The doctest directive seems to have been designed from the begin with the goal of output verification and thus has better handling for "wildcard output", blanklines, etc...
  3. doctest directive is more explicit about output and variables scope/life cycle, fixtures etc...
  4. The doctest snippets, being plain code blocks once compiled (without interleaved output) can be easily copy/pasted as they are.

    Also there is a bit the fact that the ipython directive can get noisy when the output doesn't matter (or require extra work to suppress it) but that might be subjective in some cases it's actually helpful to see what happens step by step.

asfimport commented 3 years ago

Alessandro Molina / @amol-: FYI, this seems to be the set of files where ipython directive is used


source/python/data.rst
source/python/dataset.rst
source/python/ipc.rst
source/python/memory.rst
source/python/pandas.rst
source/python/parquet.rst