Solutions for a complete testing framework

The Issue

Mkdocs-Macros needs a testing framework. This is necessary, with (according to Github) over 3500 projects depending on it, some of which are large or have themselves many dependent projects.

History

[!IMPORTANT] The project itself was originally based on a simple, big idea :bulb: borrowed from the world of wikis: using a templating engine to vastly expand the possibilities of the Markdown language. Any documentation tool needs a templating system.

Jinja2 made this easy, and the initial version of the plugin, back in 2018, was simple. Most of the later complications derived from "real-world" considerations:

The various places from which the placeholders (variables, macros and filters) should come, and
How to integrate the plugin within the build workflow of Mkdocs, controlling which pieces of the page to render or not to render, as well as logging, etc.

The question of how to test the final results arose immediately. I solved it by using the main tool that Mkdocs provides for that purpose: mkdocs serve and by watching the results in a browser. It is a quick and effective method to test anything one wishes.

[!CAUTION] However, this way of testing has a key limitation: it is not systematic. As long as there is one developper who has complete command over a simple plugin, it can work. As soon as the code becomes complex, or other developpers submit PRs, the risk of breaking something becomes too great. And with so many dependent projects, a push that breaks the code introduces risks into the lives of other people.

What is needed

Hence Mkdocs-Macros needs a testing framework, in view of Continuous Integration on GitHub.

Ideas

It is easier said than done.

In #241, I summarized the discussions prompted by @timvink, inspired from his experience on mkdocstrings. It all started from the discussion on how to make Mkdocs coexist with other plugins; we agreed we needed a hook (#237); this was done... and then the question arose of how to test the result ❓.

His contribution was essential, because it framed the problem. He also kindly submitted a PR (#239) based on pytest, which contained a good start.

I realized, however, that I would have to take a step back, and think this problem through.

Why it is difficult?

[!IMPORTANT] The problem is that a plugin (Mkdocs-Macros) relies, by definition, on the underlying piece of software (Mkdocs) in order to run. So, you have to rely on the debug/testing tools provided by the software itself.

The tool that Mkdocs provides for systematic testing is mkdocs build. It has a log, and can be made to halt in case of warnings (--strict), which is suitable for most applications. It is however, it is a binary test: the build worked or it didn't. It does not have the granularity (page by page) necessary for testing automatically the things that I had been testing manually, by launching mkdocs serve and checking each page for myself.

Examples are:

Does each resulting page contain the expected result?
Was info in the YAML config file correctly interpreted?
Does the Jinja2 context actually contain the expected variable (key, value)?
Was the page rendered/not reendered?

Of course the log (especially with the --debug option) I realized that I needed a framework for that.

[!Caution] Also, programmatically checking the resulting HTML page opens a rabbit hole: after Markdown extensions have been rendered, and headers, footers, javascript scripts, etc. have been added, the code has been altered beyond recognition. And first, we need to locate the html file that corresponds to the original markdown page we wanted to test.

Why I didn't use Mkdocs

One way to solve this issue, might have been to attempt to use the Mkdocs framework itself, .

Aside of the fact that it would have required an intimate knowledge of the intricacies this framework that I don't have, I realized that using Mkdocs to test itself would risk creating assertions that are tautologies or begging the question (accidentally formulated in a way that they can't give a False answer, because they are basically the same thing expressed in two different ways).

Solution

Here is an initial description.

Principle

The best approach was to make a completely distinct test framework.

flowchart LR
    subgraph "MkDocs"
        Mkdocs["MkDocs (Core)"]
        Mkdocs_Macros[MkDocs-Macros]
    end
    subgraph "Source"
        Markdown[Markdown pages]
        Config["Config file (YAML)"]
    end
    Config --> Mkdocs
    Markdown -->Mkdocs --> |rendering|HTML[HTML pages]

    Test(Test Framework)
    Markdown --> |source|Test
    Mkdocs --> |log|Test
    Mkdocs_Macros --> |"target (rendered Jinja2)"|Test
    Config --> |config|Test
    style Test fill:#FF0000,stroke:#000000,stroke-width:2px,color:#FFFFFF

The Test Framework, executes mkdocs build --debug (and if required, --strict) and then compares the following five inputs:

- Source
  1. Source: The original markdown files
  2. Config: The YAML configuration file (`config.yaml`)
- Target
   1. The success/failure of the building (return code)
   2. Log: the logs generated by MkDocs during the rendering process. 
   3. Target: the rendered markdown files (generated by MkDocs-Macros, using Jinja2).

Notes on the Log

The log is parsed into a list of log objects.

There are three types of log entries:

INFO    -  Building documentation...
INFO    -  [macros] - Rendering source page: literal.md
INFO    -  [macros] - Macros arguments
            {'module_name': 'main', 'modules': ['mkdocs-macros-test:mkdocs_macros_test'],
           'render_by_default': True, 'force_render_paths': '', 'include_dir': 'include',
           'include_yaml': [], 'j2_block_start_string': '', 'j2_block_end_string': '',
           'j2_variable_start_string': '', 'j2_variable_end_string': '', 'j2_comment_start_string':
           '', 'j2_comment_end_string': '', 'on_undefined': 'keep', 'on_error_fail': True, 'verbose':
           True}

Each properly formatted log entry has a severity ('INFO'), an optional source ('macros'), a title ('Macros arguments') and an optional payload (any text).

[!Note] Mkdocs-Macros uses the payload of DEBUG entries to convey the three complete dictionaries of variables, filters and macros generated at the time of on_config.

Target documents

The target documents are raw Markdown documents (after Jinja2 has been rendered), to which the original YAML header has been added. They are adequate to test the result of Mkdocs-Macros, as produced by on_page_markdown()

The framework collects parses each file and provides:

markdown (without the header)
metadata
content rendered into html,
content rendered into plain text
an advanced search method, useful for checking the content.

First Results

A first version of the test framework (test/fixture.py) has been produced.

[!Caution] This is experimental

The test framework provides a single DocProject object, which contains all elements necessary to test:

Each page (source and target), with markdown, content, etc.
The config file
The success/failure of the build (return code)
The log entries
The placeholders (variables, macros and filters) in their state at on_conf (each page is then completed by its own metadata)

Making cd into the test directory, and running pytest launches the existing tests, on two test documentation projects:

simple
module

CC: @pawamoy, @tomchristie, @squidfunk, @razorback, @gesslar

Looks like you've put a lot of thought into this. Given the cc, I take it you're asking for some feedback?

Also, programmatically checking the resulting HTML page opens a rabbit hole.

This is exactly what I do in revision-date-localized and table-reader, and I've never looked back. The only thing I had to add was scheduling of the unit tests (weekly) so I am aware of any integrations breaking because of changes upstream (mkdocs) or downstream (plugins I want to be compatible with).

TBH, in my work I always push for trying to keep things as simple as possible (e.g. What's the simplest possible thing that might work, and why didn't you try that first?. Fun thing about open source is that it's a hobby, we can go and enjoy building awesome things. And looks like you built something that works well for this plugin. I would be a bit concerned about maintainability.. do you have tests for the test framework 😅 ?

Thanks for your thoughtful remarks. I am indeed asking for feedback (providing of course that others wish to give it).

I am with you on doing the simplest thing; and indeed, that was my main concern: I wanted to write the simplest testing code possible. Also, I looked around to see what solutions existed, especially yours.

The one difference with most other plugins, is that Mkdocs-Macros can do by nature many different things, in many different ways (thanks to Jinja2). There is potentialy a lot to test, with many little details that could go wrong. I started from how I want to formulate the test assertions and I worked my way back to built the framework so as to support those assertions.

The reason why I didn't want to test final the HTML is that I don't know what other plugins could do to the code, not to mention the markdown extensions and the themes -- perhaps that choice was overly strict. But that simplified the framework for me, because once I produced the "rendered" ones (essentially just before returning the result of on_page_markdown()), I have to compare file sets with strictly identical structure and names.

Hence spending some time at the start, has been an investment that should pay back in the long trange.

Maintenability is one of my concerns. The framework had to be as robust as possible, and be based on concepts or mechanisms that are not susceptible of much change in the long run. I am not overly concerned about reading the markdown files in the docs directory, or the rendered ones in their own directory. Reading the config.yaml is a piece of cake.

The one part that could possibly break in the future, is the log, since I made up the idea of the payload. Fortunately, it didn't do so. So, unless a problem comes up , I might continue to use that approach.

Finally, yes, there is a whole series of test cases for the test framework itself. 🙂

In mkdocstrings-python I'm aiming for end-to-end tests (or golden tests?), where I render HTML for a single object (recursively) and compare the output to snapshots. It doesn't test compatibility with other plugins (such as mkdocs-macros), but I suppose it wouldn't be hard to extend the test cases to include some tests that also make use of mkdocs-macros (for example) to assert compatibility.

But your testing framework looks nice. Give me a bit of time to comment on it more.

Thanks for your feedback, as this is encouraging me to sift my ideas, recognize weaknesses, and find new avenues.

@pawamoy I like the term golden test, for a test that compares the the output of a (long and complex) process, with an expected output. 🙏

End-to-end is the expected approach, though how we define it in each case might vary.

Mkdocs takes markdown files as input, and renders them into to HTML files; obviously a golden test has to use HTML.
Mkdocs-Macros takes enriched markdown files (with Jinja2) as input and converts them into markdown.

One can conceive the output of Mkdocs-Macros as an input for Mkdocs.

Indeed, the files of raw Markdown generated by Mkdocs-Macros in debug mode (enriched with YAML front-matter), should be a drop-in replacement for the original Markdown pages.

Here is a diagram:

flowchart TD

    subgraph First
        Source[Markdown + Jinja2 pages] --> MkDocs(Mkdocs + Macros)
        MkDocs --> Targeta[HTML]
        MkDocs --> Targetb["Markdown pages<br>(debug)"]
    end

    Targetb --> Source2
    subgraph Second
        Source2[Markdown pages] --> MkDocs2(Mkdocs)
        MkDocs2 --> Target2[HTML]
    end

From that perspective what I am really trying to test, is whether or not the plugin is rendering the page correctly with Jinja2 when it should, and not rendering it when it should not. I am not testing MkDocs itself, particularly . Hence "end-to-end" could be interpreted as "from Markdown + Jinja2 to Markdown".

From the perspective of MkDocs, that would be unit test, I guess. 🤔

This is one way to look at things, of course. 🙂

This should illustrate how a test can be done with the framework:

    from test.fixture import DocProject

    PROJECT = DocProject('myproject')
    PROJECT.build(strict=False)
    # did not fail 
    assert not PROJECT.build_result.returncode 

    # ----------------
    # Page 'index'
    # ----------------
    page = PROJECT.get_page('index')
    assert page.is_rendered

    VARIABLE_NAME = 'greeting'
    # it is defined in the config file (extra)
    assert VARIABLE_NAME in PROJECT.config.extra
    # check that the `greeting` variable is rendered:
    assert VARIABLE_NAME in PROJECT.variables
    assert PROJECT.variables[VARIABLE_NAME] in page.markdown

@pawamoy The adage seems right: “Inside every large program, there is a small program trying to get out.” – C.A.R. Hoare

I realized that the framework could perhaps be generalized for testing any doc site, or any plugin. So I made a general version (DocProject) and the features I needed specifically for Mkdocs were put into a subclass (MacrosDocProject). So far, it works.

I wonder whether I could extract the core of the test framework and turn it into a Python package for general use? I would see two use cases:

Testing MkDocs plugins with minimal effort.
Large projects that have an MkDocs project on-board, and where there is a need to have better checks than merely "it builds or not" (for example, testing that pages contain correct and consistent information, etc.).

fralau / mkdocs-macros-plugin