gammapy / gammapy

A Python package for gamma-ray astronomy
https://gammapy.org
BSD 3-Clause "New" or "Revised" License
233 stars 198 forks source link

Links between notebooks and Sphinx docs #2175

Closed cdeil closed 2 years ago

cdeil commented 5 years ago

@Bultako - I'm opening this issue to continue the conversation from #2165 concerning how to link between notebooks and Sphinx docs.

It's clear that raw URLs and no link-checker in the Markdown cells to link to Sphinx docs, like we have now, is error-prone and not a good solution. In RST files we have the great Sphinx directives to create links. This can go any way, many things can be linked to directly, and one can also put labels anywhere you want to link to.

We're not the first to notice that issue of course, see e.g. https://github.com/spatialaudio/nbsphinx/issues/89

The ideal solution would be to:

With notebooks and MD there is no "standard" way to create links to Sphinx docs, i.e. no :ref:. Maybe introducing an as-similar to RST as possible syntax in MD and then processing it from Sphinx is possible, with only a little bit of custom docs processing code in Gammapy?

A bigger change to the notebook tooling would be to maintain the tutorials in RST files instead of IPYNB files, and then to generate IPYNB and HTML from that. See e.g. https://github.com/QuantEcon/sphinxcontrib-jupyter . The idea to maintain notebook context in text files instead of IPYNB / JSON files has been around for a long time, e.g. https://github.com/mwouts/jupytext is another project that advocates and provides some tooling for that.

I'm labeling this as "question" and v0.14 for now, it's not urgent.

@Bultako - Maybe you can think about it and make a recommendation in the coming weeks how to improve the links?

Bultako commented 5 years ago

The ideal solution would be to: for HTML, generate relative URLs, so that e.g. if I build the docs locally, I can browse them locally, without hitting gammapy.org

This is already working. HTML versions of notebooks in the docs have relative links to the other (API/RST) docs or to other notebooks, just to stay in the same version of the docs.

for Jupyter notebooks, generate absolution URLs and put them in the MD cells, so that users can read and browse and use the links when working with the notebooks. This would have to link to a given version, but I don't think that can be avoided, except if we shipped HTML docs to user machines (like Rust does for every install), then again relative links could be used, e.g. gammapy docs could open up the local docs. But I'm not really proposing that, would go with versioned absolute links.

This can be easily done modifying the regexp here.

Links can be written in cell notebooks as absolute links (we already put them in this format in the notebook cells) pointing to https://docs.gammapy.org/xxx/ The /xxx/ in the url can be replaced in the notebooks that are generated in /docs/build/html/_static/notebooks by the number of the release declared at the moment of doc building.

Users will download the notebooks that are in https://docs.gammapy.org/xxx/_static/notebooks (the ones that have the good versioned absolute links), from the top-box displayed in the HTML-version of the tutorials (as it is already the case) and also with gammapy download tutorials that would have to fetch the notebooks from those versioned url (instead of using Github)

warnings for broken links (e.g. to API docs when classnames change in Gammapy) are emitted on docs build

I hope this will not slow too much the already too-long doc building process. I can make some tries.

cdeil commented 5 years ago

Links can be written in cell notebooks as absolute links (we already put them in this format in the notebook cells) pointing to https://docs.gammapy.org/xxx/ The /xxx/ in the url can be replaced in the notebooks that are generated in /docs/build/html/_static/notebooks by the number of the release declared at the moment of doc building.

But having to type https://docs.gammapy.org in the MD is annoying, and having to type a version xxx is error-prone. E.g. if you look at https://docs.gammapy.org/0.11/notebooks/first_steps.html#Maps you see that the first link is pointing to https://docs.gammapy.org/dev/maps where it should point to https://docs.gammapy.org/0.11/maps I think we have many such cases of x-linking to the dev docs version from stable docs (i.e. lots of links from tutorials will break as we evolve Gammapy). So the idea should be to avoid having to give a version number in the MD, and then to have the docs build always generate the link to the correct version.

I hope this will not slow too much the already too-long doc building process. I can make some tries.

Note that Sphinx has a built-in linkschecks builder that one can run via make linkcheck:

https://github.com/gammapy/gammapy/blob/08cd4d20b105bfe5d0a099b5e7dcf7e89ad59ea4/docs/Makefile#L36

So this is not run all the time for every docs build. It's something the release manager runs before making a release, i.e. it's an extra option or command that is part of our checklist to prepare a release. This statement only concerns external HTTP requests and checks for which you have to put a ~ 10 sec timeout to be reliable. If most links are changed from raw URLs to relative local references, then they can run all the time because it's very fast. Really this is needed anyways, because usually when working on code and docs the new version won't be on docs.gammapy.org yet, so any solution like now that has lots of URLs pointing to docs.gammapy.org will always be painful and error-prone.

Bultako commented 5 years ago

But having to type https://docs.gammapy.org in the MD is annoying, and having to type a version xxx is error-prone.

You don't have to type the version, just copy/paste the url from other browser tab where you have the gammapy web-docs open. Whatever /xxx/ is in the url you have copied will be replaced by the right version in the doc building. Moreover, all the notebooks have already absolute links, we will not have to modify them.

if you look at https://docs.gammapy.org/0.11/notebooks/first_steps.html#Maps you see that the first link is pointing to https://docs.gammapy.org/dev/maps where it should point to https://docs.gammapy.org/0.11/maps

That behavior will change.

So the idea should be to avoid having to give a version number in the MD, and then to have the docs build always generate the link to the correct version.

Yes. That's what I wanted to say.

Bultako commented 5 years ago

For finding broken links in notebooks I would suggest to use specific external tools made for this kind of tasks. These are the broken links that I've got using brök:

https://gist.github.com/Bultako/cab91675d77695d5d3ee69102e064df3

Bultako commented 5 years ago

We have two alternative ways to write from RST files, links to tutorials as relative links. https://docs.gammapy.org/dev/development/howto.html#link-to-a-notebook-from-the-docs

We have a way to write from .ipynb files, links to doc pages and tutorials as absolute links pointing to the right docs version. We paste the already published absolute URL in the MD cell of the notebook (https://docs.gammapy.org/xxx/ ...) and there's a magic that transforms this URL into the right version of the docs during the doc building process, so notebooks that will be downloaded as .ipynb files and HTML tutorials have links pointing to the right docs version.

For finding broken links in notebooks we need an external tool like brök. See comment above.

@cdeil You may close this issue if you're fine with this or move it to milestone v0.15 if not.

cdeil commented 5 years ago

In #2463 I propose that we move more docs to notebooks, and that we should create more x-references, i.e. 100s of lines from Notebooks to sections or anchors in the RST files, or to specific classes, functions methods in the API docs.

I think this is non-ideal (very long, and have to remember to put exactly https://docs.gammapy.org/dev/api/ in front)

[gammapy.data.EventList](https://docs.gammapy.org/dev/api/gammapy.data.EventList.html)

Ideally we would be able to use RST with the normal ways to create references, i.e. be able to write e.g.

`gammapy.data.EventList`

I gather this isn't possible. Or would using raw RST cells be a good idea and then it is possible?

Looks like there's another alternative to do it, a built-in solution in nbsphinx: https://nbsphinx.readthedocs.io/en/0.3.5/markdown-cells.html#Links-to-Domain-Objects

We'll spend a lot of time in the coming month to create nicely x-linked documentation, in both directions, and we also want to link to sections within RST pages, and to methods, not just classes.

At the moment https://docs.gammapy.org/dev/development/howto.html#link-to-a-notebook-from-the-docs only explains how to link in one direction, and it mentions two ways to do it, without saying when which one should be used. Are they equivalent, should we pick one way?

There's 100s of cases where people didn't create links to the API docs from the notebooks, probably because it was cumbersome to type. Just to give one random example, in https://docs.gammapy.org/dev/notebooks/hess.html#Observation-selection I think we should link to the Analysis class, and also the various methods like get_datasets that are mentioned there, so that readers can click and look up the full API docs if they want.

@Bultako - Could you please have another look at this tricky issue, and send a PR where you update the dev HOWTO with a recommendation and examples how to do it both ways (and link to sections & class methods)?

That would help to have a good reference if we try to "crowd source" the documentation writing in the coming months.

cdeil commented 5 years ago

@Bultako - Another example I found is in https://docs.gammapy.org/0.14/notebooks/first_steps.html

As you can see in the screenshot below, we have misformatting and broken links when people tried to link to Gammapy API docs. Maybe if https://nbsphinx.readthedocs.io/en/0.3.5/markdown-cells.html#Links-to-Domain-Objects works that's the best we can get. Does it generate a warning or error when someone mistypes or some class is renamed, so that we don't have to manually check links in all tutorials from time to time?

Another issue: if you click "Excercises" there, it takes you to https://docs.gammapy.org/0.14/notebooks/first_steps.html#Exercises which is a different section that also has the heading "Exercises" further up in the notebook. Is there a way to fix this in nbsphinx? Or should we just manually avoid having duplicated headings?

Screenshot 2019-10-17 at 11 20 04

Bultako commented 5 years ago

Another issue: if you click "Excercises" there, it takes you to https://docs.gammapy.org/0.14/notebooks/first_steps.html#Exercises which is a different section that also has the heading "Exercises" further up in the notebook. Is there a way to fix this in nbsphinx? Or should we just manually avoid having duplicated headings?

This happens for any lambda HTML page having duplicated anchors. I guess we should adjust it manually and consider it among the good practices for doc writing.

cdeil commented 5 years ago

A big improvement was #2480 by @Bultako .

However, this comes at the cost of another 100 lines of custom notebook/docs processing code in Gammapy (gammapy/utils/tutorials_links.py), parsing and processing MD cells with Python str replace and regular expressions. And it's only a partial solution, offering part of the RST linking functionality we really want. We still don't have RST :ref: or tags, or intersphinx to be able to write e.g. astropy.table.Table or numpy.ndarray and have that generate a link. We have to manually put the full URL to their docs every time (and we do, with inconsistent and outdated versions that we point to, especially for scipy.

Basically what I think would work best for us would be if we authored all content, including notebooks in RST, or a simple, version-control friendly format like the light format, but using RST instead of MD. Then we'd get the full power of RST and have the same markup for RST and notebooks so autors / devs would only have to learn one way to do markup (especially x-links). Then ideally there would be a tool that generates IPYNB files as well as HTML docs from that source. We would still have some script that executes and processes all the notebooks like we do now, but we wouldn't have to write and maintain MD parsing and processing code.

I did not find a tool that does this yet, so I don't have an immediate solution here to propose. Still, I'd prefer if we kept this issue open for now, and kept looking for a better solution in the coming weeks and months.


Concerning the duplicated anchor problem for "Exercises". In RST, one can put tags and those will be used to create the anchor link. Otherwise Sphinx will generate URLs like https://docs.gammapy.org/0.14/changelog.html#id6 which isn't great because when more content is added at the top of the page, those links will shift and just point elsewhere (so be very confusing if someone linked to the Gammapy docs from externally). So this is something that needs to be improved both for RST and IPYNB authored pages.

Bultako commented 5 years ago

@cdeil I guess you should have a look at Jupinx for the rst->ipynb approach you suggest.

cdeil commented 5 years ago

First of all: @Bultako - do you agree that it would be better to maintain our notebooks either as RST or as a simple txt format with RST instead of MD cells (e.g. commented, like in the light format) instead of IPYNB/JSON in the git repo? Or do you think our current setup is just fine, you wouldn't spend time on changing something?

If we make a change, yes, https://github.com/QuantEcon/sphinxcontrib-jupyter looks like a good option. I don't see a description of their interactive authoring / testing workflow, i.e. how they load up the notebook in JupyterLab, edit it, and then save it back. Jupytext has a solution for that (see https://github.com/mwouts/jupytext/blob/master/README.md#jupytext-commands-in-jupyterlab) and as far as I can tell is getting adopted more widely. It could even be a combination of the two tools. I mean: converting between RST with .. code-block:: python and the light format should be ~ 10 lines of simple Python code in either direction, no?

Bultako commented 5 years ago

@cdeil I still do no have a clear answer to this. It's true our current setup does not solve all the intersphinx issues, but I don't know if it's a major or minor problem. I'm a bit skeptical concerning authoring of notebooks in RST, as you pointed out this mainly concerns the authoring / testing workflow, but also every issue that will appear and that we do not see right now. i.e. how to provide HTML versions of the notebooks with the output results? At this moment we are doing it with nbsphinx, Jupinx says it can do it, but I guess it does not integrate well with our RTD theme, and Sphinx will certainly build the HTML pages from RST files, but with empty output cells.

Moreover, making contributions to the doc processing tasks in gammapy takes lot of time for testing and it is not really pleasant :) I'd say I'd be ok to spend sometime on this if we identify a solution we are mostly sure it would work.

mgeier commented 4 years ago

I just skimmed this discussion, but are y'all aware of https://nbsphinx.readthedocs.io/en/0.5.0/markdown-cells.html#Links-to-Domain-Objects?

You should be able to use something like this:

[some text](api/gammapy.maps.Geom.rst#gammapy.maps.Geom)

This will raise a warning (or error?) if the link is incorrect.

adonath commented 4 years ago

@Bultako I have you by chance checked the proposal by @mgeier?

Bultako commented 4 years ago

@adonath

In #2480 we implemented a syntax to write links in notebooks MD cells to our auto-generated API docs (classes and methods). This syntax is similar to the one we use in the RST files. In the docs building process we translate this syntax into the one understood by nbsphinx (the one proposed by @mgeier)

This PR discussion derived into completely changing our doc building process when dealing with notebooks. And I'm still skeptical about this.

adonath commented 4 years ago

Thanks for the explanation @Bultako. Just one more question: why can't we use the "nbsphinx" syntax directly? To handle links to different Gammapy versions?

Bultako commented 4 years ago

@adonath

Yes, we can use the nbsphinx syntax directly in MD cells of notebooks, and we can also use the syntax exposed in https://github.com/gammapy/gammapy/pull/2480.

In the case we use the nbsphinx syntax:

Yes, It's kind of complex :( https://github.com/gammapy/gammapy/blob/4987620e98242280541c087bc6a79853041adc3d/gammapy/utils/tutorials_links.py#L28

mgeier commented 4 years ago

I think the problem is that you seem to be doing some custom pre-processing to the notebooks and moving them to the source directory in the process, is that right?

If you would put the notebooks into the source directory in the first place, you would be able to use nbsphinx's features for handling links. This way, the links will always point to the correct version.

I think you could drop most of your custom notebook-mangling code, because most of it is implemented in nbsphinx anyway (e.g. adding a box to the top of each notebook, see https://nbsphinx.readthedocs.io/en/0.5.1/prolog-and-epilog.html).

If the there are missing features, please let me know!

mgeier commented 4 years ago

BTW, there is a "gallery" feature coming soon in nbsphinx: https://github.com/spatialaudio/nbsphinx/pull/392 (right after the next Sphinx-Gallery release).

Bultako commented 4 years ago

@mgeier

re: sphinx-gallery feature That's really very good news !! We have been recently talking about the possibility of having our tutorials exposed in the way sphinx-gallery does, and we have actually added a small sphinx-gallery recently based on scripts, that we could migrate to notebooks if it were possible. I think this is a big step forward for nbsphinx, thanks for all your work in this library.

re: links in notebooks The main thing in this thread is linking to our API documentation generated by autodoc sphinx extension with a kind of cross-referencing syntax (see also the same issue in https://github.com/spatialaudio/nbsphinx/issues/89 and how we have patched it in https://github.com/gammapy/gammapy/pull/2480) We certainly need some custom pre-processing for this, and yes some other minor stuff could be possibly done by nbsphinx now. I'll have a look at the last features I've missed :)

mgeier commented 4 years ago

We certainly need some custom pre-processing for this

Why?

Assuming that you have your notebooks in doc/notebooks/ (which are somehow moved there?) and assuming the file doc/api/gammapy.maps.Geom.rst (which is auto-generated?) exists, you can use a link like this in your notebooks:

[some text](../api/gammapy.maps.Geom.rst#gammapy.maps.Geom.to_cube)

When you click on it in JupyterLab, this will be a valid link to the RST file (assuming it has been auto-generated yet; and the part after # will be ignored). When you look at the notebook on Github or nbviewer, the link will of course not be valid, because the auto-generated file is not in the repo.

When you run nbsphinx, this link will be converted to a link to the HTML file api/gammapy.maps.Geom.html#gammapy.maps.Geom.to_cube (and the part after # will work correctly). Sphinx will also automatically check if the link (including the # part) is valid and raise an error otherwise.

Would this not be satisfying for you?

Here's an example: https://sfs-python.readthedocs.io/en/0.5.0/examples/animations-pulsating-sphere.html (see the first three links).

Bultako commented 4 years ago

@mgeier

I understand your point, but the main thing here is that when writing the links in the MD cells we prefer to avoid this complex syntax.

[some text](../api/gammapy.maps.Geom.rst#gammapy.maps.Geom.to_cube)

and use the one below instead:

`~gammapy.maps.Geom.to_cube`
mgeier commented 4 years ago

OK, cool, that's of course up to you.

I just wanted to make sure that you know what kind of links do work in nbsphinx.

adonath commented 2 years ago

This is solved by changing to the sphinx gallery format. API links are rendered using normal RST syntax.