Adding graphics files to the docs directory

robfalck commented 4 years ago

I'm attempting to use mkdocs in conjunction with my module's test suite to ensure that all of my documented examples function correctly.

My certain unittest.TestCase methods generate plot files that I would like to embed in my documentation.

My intended procedure for building documentation is:

run my unit tests
assuming they all pass, build my documentation

My macros are below. They function as expected, copying the plot file into the docs structure and returning the appropriate markdown.

The problem is, it seems that the macro is running too late and mkdocs does not see the image files, resulting in the following error during mkdocs build or mkdocs serve:

INFO    -  Cleaning site directory 
WARNING -  Documentation file 'getting_started/brachistochrone.md' contains a link to 'getting_started/figures/TestClass.test_method_1.png' which is not found in the documentation files. 
INFO    -  Documentation built in 2.60 seconds

Is there a proper way to have mkdocs-macros place an image file into the docs directory? Running mkdocs build twice will work around this issue, but it seems like there should be a better way to accomplish this.

The macro:

    @env.macro
    def embed_test_plot(reference, index=1, alt_text=''):
        test_case, test_method = reference.split('.')[-2:]
        testcase_obj = get_object_from_reference('.'.join(reference.split('.')[:-1]))
        test_dir = Path(inspect.getfile(testcase_obj)).parent
        plot_file = test_dir.joinpath('_output').joinpath(f'{test_case}.{test_method}_{index}.png')

        dir_path = get_parent_dir(env)
        dest_path = dir_path.joinpath(f'figures')
        shutil.copy(plot_file, dest_path)

        return f'![{alt_text}](figures/{test_case}.{test_method}_{index}.png)'

And the get_object_from_reference function:

def get_object_from_reference(reference):
    split = reference.split('.')
    right = []
    module = None
    while split:
        try:
            module = importlib.import_module('.'.join(split))
            break
        except ModuleNotFoundError:
            right.append(split.pop())
    if module:
        for entry in reversed(right):
            module = getattr(module, entry)
    return module

github-actions[bot] commented 4 years ago

Welcome to this project and thank you!' first issue

fralau commented 4 years ago

@robfalck, thanks for your stimulating question. It seems we have a bootstrap issue (chicken and the egg), or something here?

Let me have a look at this until I get the full picture.

fralau commented 4 years ago

Perhaps I have an intuition of what going on...

What does getting_started/brachistochrone.md contain? Just a static link to 'getting_started/figures/TestClass.test_method_1.png?

I guess a macro would likely have no guarantee of being synchronously executed for any page, unless you called it explicitly from that markdown page...

Unless you are already calling the macro from getting_started/brachistochrone.md?

If that's already the case, perhaps you could try to generate the link to 'getting_started/figures/TestClass.test_method_1.png' through a jinja2 construct (string)? That would likely force the synchronization?

Let me know if that makes sense.

robfalck commented 4 years ago

Can you explain the last paragraph about forcing the synchronization.

Heres a similar, simpler markdown file and related macro that exhibits the same behavior:

/docs
    /figures
        <initially empty>
    /scripts
         myplot.py
    test_page.md

Where test_page.md is

# Embedded plot test

{{ embed_plot_from_script('scripts/myplot.py',
alt_text='sine wave'
) }}

The macro which puts the plot generated by myplot.py into the figures directory is:

    @env.macro
    def embed_plot_from_script(script_path, figname=None, alt_text=''):
        import matplotlib.pyplot as plt

        plt.switch_backend('Agg')
        d = dict(locals(), **globals())

        dir_path = get_parent_dir(env)
        path_to_script = dir_path.joinpath(script_path)

        if figname is None:
            figname = '.'.join(path_to_script.name.split('.')[:-1])

        exec(open(path_to_script).read(), d, d)

        output_path = dir_path.joinpath(f'figures/{figname}.png')
        plt.savefig(output_path)
        return f'![{alt_text}](figures/{figname}.png)'

And the plot-generating file (myplot.py) is:

import numpy as np
import matplotlib.pyplot as plt
plt.switch_backend('Agg')
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()

fralau commented 4 years ago

OK, my understanding is that you are generating a png file on the fly which you are storing on the disk, for which you are generating a link in the page, correct?

I'im a little puzzled, because I don't see why your minimal example wouldn't work? Does that simplified case really produce the anomaly you described?

Clearly, the macro embed_plot_from_script() should work: the png file will be created and THEN the function will return the link, which will be inserted in the page. You could easily check that this Python code works OK, outside of mkdocs (and I guess you did?).

But if another markdown page in your project contained a static (pure markdown) link to that png file, and that page was rendered before test_page.md (because that other markdown page is listed before test_page.md in the mkdocs.yaml file), then yes, of course, there could be a problem (that's what I called a 'synch' problem; but that would be really a sequential problem).

Are we doing any progress?

robfalck commented 4 years ago

Ok we're on the same page. I'm only referencing this image via the macro, no static markdown. But somehow the mkdocs doesn't think the image is there. I even tried sleeping after the savefig call in case the file hadn't finished being saved by the time the png inclusion in markdown is processed, with the same result.

It's as if the file structure is being set before the markdown is processed (the macro is adding the png file to the directory too late in the process). I'll think about it some more but I'm a bit perplexed right now.

fralau commented 4 years ago

The only explanation that I could come up with, right now, is that plt.savefig(output_path) is somehow trated asynchronously? 🤔

I must admit that I "use" the mkdoc's plugin system, and I don't know exactly what could happen there with the on_page_markdown() event (which is when the rendering of the jinja2 code in markdown is taking place).

Here is the description:

The page_markdown event is called after the page's markdown is loaded from file and can be used to alter the Markdown source text. The meta-data has been stripped off and is available as page.meta at this point.

Perhaps a member of the mkdocs team might see what is happening 🤔.

BTW, I noticed that you indirectly worked out the docs directory in the python module. There is in principle no need to do that: there is a env.conf['docs_dir'] value that gives your docs directory (env.conf is basically the project info described on that page). Perhaps you would try to use that attribute, for good measure?

[Note that I am partially to blame for that, because that was perhaps not too obvious in the documentation.]

Finally, one thing you could do to avoid that kind of issue altogether, would perhaps be to avoid creating a temporary file and simply inject the image into the HTML? I realize that using savefig() is the way generally recommended to do that, but maybe there is another way?

robfalck commented 4 years ago

I thought about the asynchronous issue, but issuing a one-second sleep after that did not help. I suspect directly encoding the graphic into html will be the way to resolve this. I'll get back to you when I have a chance to try it.

fralau commented 4 years ago

This is "curiouser and curiouser" as Alice used to say.

I would be interested in learning what you find. If you find an elegant explanation + solution, I would be quite willing to document it on the website.

If all else fails (and you wish), you could prepare a git repo that demonstrates the case with a minimal mkdocs project, and I will have a look at it (a zip file will do too...).

robfalck commented 4 years ago

Embedding the image as an image via html is an acceptable workaround for now. I'll post a functional repository that demonstrates the issue with adding a new image file to the docs directory, in case you wish to keep working at it.

The macro for running a script and embedding the resulting PNG in the documentation is below:

    @env.macro
    def embed_plot_from_script(script_path, alt_text='', width=640, height=480):
        import matplotlib.pyplot as plt

        plt.switch_backend('Agg')
        d = dict(locals(), **globals())

        dir_path = get_parent_dir(env)
        path_to_script = dir_path.joinpath(script_path)

        exec(open(path_to_script).read(), d, d)

        buf = io.BytesIO()
        plt.tight_layout()
        plt.savefig(buf, format="png")
        data = base64.b64encode(buf.getbuffer()).decode("ascii")
        return f"<img alt='{alt_text}' width='{width}' height='{height}' src='data:image/png;base64,{data}'/>"

fralau commented 4 years ago

Thanks a lot! I am happy that you found a workable solution.

Indirectly, this is showing that there is indeed a problem with "disk synchronization" (for lack of a better term): some unspecified issue is preventing macros from writing something onto disk and reading it back.

Your fix of inserting the raw diagram into HTML is not absurd actually. I would even go as far a saying it is elegant, since it avoids using a temporary file (inserting base64 code into a HTML page is perhaps uncommon, but can't see anything wrong with it).

I note the following for future reference, or in case it coud interest you: Another plugin I maintain (mkdocs-mermaid2) does the same trick with mermaid diagrams, though at a later stage: the code that produces the plot is actually inserted into the HTML. The rendering is then executed in the browser, on the fly, by a javascript function.

[Fore reference, it is as if you had inserted the my_plot.py directly into the HTML file and then added a javascript function to interpret it. Of course, inserting a Python / mathplotlib interpreter in a javascript library is not an exciting prospect...]

This was the solution originally adopted for mermaid, because that is how it is usually implemented. That makes the processing in mkdocs simpler, but the tradeoff is the javascript interpretation in the browser (and there is a delay in displaying the page).

By contrast,"pre-digested" png code will render faster. These are two different approaches (with advantages and disadvantages), but which achieve the same result.

fralau commented 4 years ago

I added the tag "help wanted", in case passers-by with a good knowledge of mkdocs could help us solve this mystery of the "disk-synchronization bug".

fralau / mkdocs-macros-plugin

Adding graphics files to the docs directory #37