Reorganize website source material to simplify updating

speth commented 11 months ago

Abstract

In conjunction with the website backend updates envisioned by Cantera/cantera-website#209, Cantera/cantera-website#210, and Cantera/cantera-website#211, I think it's time to reconsider the split of what content belongs as part of the cantera-website repository, and what belongs in the main cantera repository alongside the source code.

Motivation

Having spent some time on the website updates for Cantera 3.0 (see Cantera/cantera-website#248), I'm coming to the conclusion that a large portion of what is currently in the website repo actually belongs with the source. Given the norm of having the unversioned website content be applicable to the stable Cantera release, many of these updates are things where we don't really have a way to stage changes for the new release as it's being developed, like documenting changes to the build dependencies, deleting references to the now-removed CTI format, and documenting new models. And even when updates can be made in parallel, the additional work of putting together a second PR for the website is often delayed or never happens.

This strong separation between code and its documentation is also I think one of the causes for a significant backlog of updates where changes and new capabilities are not documented at all, outside the somewhat obscure API docs. Just as a starting point, here are ones where issues have been created:

Cantera/cantera-website#169
Cantera/cantera-website#172
Cantera/cantera-website#216
Cantera/cantera-website#244
Cantera/cantera-website#247
Cantera/enhancements/issues/142
Cantera/enhancements#146
Cantera/enhancements/issues/6 (the granddaddy of them all)

Description

Specific sections currently in the cantera-website repository that I think belong in the main code repository are:

Installation instructions
Programming tutorials
Input file tutorials
"Science" documentation

What this leaves for the cantera-website repository is pretty light:

Home page
Community
Blog (mostly unused)
structure to enable navigating all version-specific documentation

Using this organization, I think we should consider ourselves free to make documentation updates to the current stable branch and always use that branch to generate the active version of the website, even if we don't anticipate a future bugfix release.

I would say that my biggest question in all of this is how to go about implementing it, alongside @bryanwweber's suggested/planned changes to use a stack consisting of Sphinx / MyST / Pydata theme.

bryanwweber commented 11 months ago

Thanks for writing this up Ray! I don't have super detailed thoughts at the moment. I think this largely makes sense and the "two pull requests" model didn't work as well as I'd hoped 😔 just two initial thoughts:

have a way to stage changes for the new release as it's being developed

Just adding the link to the last issue where we discussed this. Other discussions have happened elsewhere, but that's the main one. https://github.com/Cantera/cantera-website/issues/88
The biggest benefit to having a separate repo is the ability to update the website off cycle of a Cantera release, especially things like the science, installation instructions, community, etc. (New science associated with features certainly should be updated per-release, but a lot of the science docs need improvement for existing features and that shouldn't wait for a release). Maybe the real benefit is just having the ability to build the website independently, even if it lives in the same repo?

speth commented 11 months ago

Thanks for the link to the previous related discussions. I think that thread is a good reminder of some of the ways that we've previously struggled with this.

2. The biggest benefit to having a separate repo is the ability to update the website off cycle of a Cantera release, especially things like the science, installation instructions, community, etc.

My suggestion was to leave the "community" content associated with the website repo -- I agree that it is decoupled from Cantera's release cycle.

I think the case of installation instructions fits pretty naturally as part of the main repo: updates related to changes in the development version, e.g. changes in build requirements or installation options, can be made only to the main branch. Changes that are about improving documentation or noting external changes (say, an easier way to install some dependency) can also be made to the main branch, and then cherry-picked onto the branch for the latest release which would then be used to build the website (or at least, the default / most visible parts of the website).

(New science associated with features certainly should be updated per-release, but a lot of the science docs need improvement for existing features and that shouldn't wait for a release)

The science docs have needed a significant update for several releases now, and I think the current organization is part of why we haven't at least managed to populate the "Science" section of the website with the relevant content that currently lives in the main repository, in the Doxygen docstrings for many of the C++ classes. I'm hoping that this restructuring will make it easier to update that content in smaller chunks so that it actually gets done. I think we should prioritize writing high-quality documentation for the current development version of Cantera. If we're also able to backport some of that content to the latest stable release, that's a nice bonus, but we shouldn't exert too much effort on that, since the development version will eventually be the stable release. If anything, I'm hoping this will reduce one of the points of friction for more frequent releases.

Maybe the real benefit is just having the ability to build the website independently, even if it lives in the same repo?

One other reason to keep the website as a separate repo is that it provides a clear way of managing and navigating the version-specific pages that are built from the main repo. This is something I remember being very tricky to manage back when the whole site was generated out of the main repo. By contrast, I think the current "documentation" landing page (https://cantera.org/documentation/index.html) handles this quite well.

bryanwweber commented 11 months ago

I definitely don't disagree with the direction of this suggestion, just trying to remember the context that led us to this point in the first place and make sure we're moving towards fixing the underlying problem 😊

navigating the version-specific pages that are built from the main repo

As it happens, the pydata-sphinx-theme has a built-in version switcher widget, similar to the one on readthedocs. https://pydata-sphinx-theme.readthedocs.io/en/stable/user_guide/version-dropdown.html I haven't looked into how it's implemented and how that crosses over with our usage. It looks like we'd maybe want to "version" all the pages somehow. Cory also suggested something related over here: https://github.com/Cantera/cantera-website/issues/229

ischoegl commented 11 months ago

@speth and @bryanwweber ... I likewise appreciate the writeup.

From my side, my 2 cents are that it matters less where documents are located and more how they are written. The MyST markup envisioned in Cantera/cantera-website#211 would make documentation a lot easier, as it is more intuitive and has several extremely useful extensions (the Jupyter notebook integration/conversion is impressive).

Regarding tutorials, things get a little iffy: I would certainly agree with moving the C++ examples over to the main repository (the tutorial examples are just simpler versions of samples anyways). For Python, I believe the most appropriate tutorial format are Jupyter notebooks; at the same time, this is where MyST shines (at least if I recall correctly), so I guess these could be moved to the main repo after we have a way to render content on the website (presumably the same is true for Jupyter notebooks). For MATLAB, the most appropriate tutorials would be live scripts, which are not git friendly. We definitely should not use *.m tutorial files for the new MATLAB interface, so I don't see a good way of moving MATLAB tutorials without unnecessarily blowing up the repo size.

As an aside I would also like to point out is that the state of the doxygen documentation is likewise a sore point. There is a lot of useful information in there that is almost impossible to find. While we do a good job of adding documentation for individual functions, the big picture is not great (although it's not too difficult to improve the situation at least somewhat; e.g. Cantera/cantera#1534). For some of the more technical documentation purposes, I'm wondering whether it may make sense to improve documentation within the source code (e.g. #169), rather than writing up separate sections? One example illustrating the current situation is the documentation of AnyMap (see here): the documentation is excellent, but unless you know what you're looking for, it's impossible to locate. Doxygen is unfortunately set up in a way that helpful details are buried far down on the page, but that can be changed using XML. Doxygen also supports markdown at this point.

In summary, I believe that:

resolving Cantera/cantera-website#211 should have priority, but I'm not sure how close (or far) we are from that point?
an adequate split between doxygen documentation and other types of write-ups needs to be formalized
115 probably deserves an honorary mention here (this is mostly about motivation and visual consistency; I quite like the proposed alternative, but I'm also not opposed to "breathe" as the linking portion does need to be resolved somehow).

bryanwweber commented 11 months ago

Thanks @ischoegl. On the topic of Doxygen, since we're switching to Sphinx, there are many extensions that support direct integration of Doxygen XML output into Sphinx sites. Among many options are Breathe and Exhale. I've not evaluated the options to see what might work best for us.

As a note about Jupyter and Matlab Live notebooks which have inscrutable git histories, I wonder if git submodules in the main repo makes sense for those? Since we do submodule checkouts on CI anyways, the examples would be there for running without polluting the main git history.

ischoegl commented 11 months ago

On the topic of Doxygen, since we're switching to Sphinx, there are many extensions that support direct integration of Doxygen XML output into Sphinx sites. Among many options are Breathe and Exhale. I've not evaluated the options to see what might work best for us.

It would be interesting to see what works how. Getting a better feel for the Sphinx/doxygen integration would go a long way in terms of making decisions on where different parts of the documentation would 'live', especially if we want to house as much as possible in the main repo.

As a note about Jupyter and Matlab Live notebooks which have inscrutable git histories, I wonder if git submodules in the main repo makes sense for those? Since we do submodule checkouts on CI anyways, the examples would be there for running without polluting the main git history.

Based on my understanding, MyST will make Jupyter notebooks almost unnecessary, so the repo pollution issue is moot? At the same time, git submodules are still interesting.

PS: exhale is - at least to me - on the lower end of what I'd like to see in terms of number of contributors (it's essentially a single developer). I feel a little safer about breathe.

ischoegl commented 11 months ago

One issue that remains to be resolved is how to link to documents that are imported by Cantera/cantera-website from content generated by CI on the main repo. One example that is important (although currently well hidden) is https://cantera.org/documentation/docs-2.6/sphinx/html/yaml/index.html (I am pasting the full link deliberately). It is relevant to the doxygen documentation of the C++ API, where it is currently referenced as "../../../../sphinx/html/yaml/index.html", having those relative links is definitely not ideal.

speth commented 11 months ago

Well, that's just a link within version-specific API documentation, where both the source and target are already generated in the main repo. I think the only realistic alternative for links from Doxygen to Sphinx would be to use breathe. I'd say this is independent of what I'm suggesting here, and covered by #115.

ischoegl commented 11 months ago

@speth / @bryanwweber After working with the doxygen interface, I honestly believe that moving some of the reST documents (example: YAML Input File Reference) into doxygen markdown pages is much simpler than the opposite direction (trying to integrate everything using breathe). There is less of a need to cross-link Spinx/doxygen and the documentation is consistent (no relative/broken links).

Here's an example (using Cantera/cantera#1546 plus some pandoc-converted/edited markdown files):

I only did part of this to get a feel for the outcome; see this branch https://github.com/ischoegl/cantera/tree/move-yaml-api-docs ... the conversion isn't too difficult, but this is certainly something that warrants some feedback before investing more work.

bryanwweber commented 11 months ago

Thanks for this work @ischoegl. I actually believe any work towards moving files around or changing formats is premature until the website stack is further settled. I don't think things are in a state where we can evaluate how integration should be done yet. 🤷‍♂️

ischoegl commented 11 months ago

Thanks for the comment, @bryanwweber.

I actually believe any work towards moving files around or changing formats is premature until the website stack is further settled. I don't think things are in a state where we can evaluate how integration should be done yet. 🤷‍♂️

This is a fair point. At the same time, I strongly believe that from a maintenance aspect, keeping two distinct websites, i.e. one user-facing for tutorials, examples, Python/MATLAB documentation, etc. (Sphinx) plus one developer-centric (doxygen) may be the best of both worlds. Doxygen is extremely good at what it is made for, and what you see above can be achieved with basic elements of recent doxygen versions (the css theme just makes it look more up-to-date). Essentially, all of this works "out of the box".

Regarding content: this was merely an attempt to see how markdown files integrate into doxygen. Overall, it works beautifully (with minor caveats, as some reST formatting abilities are missing - mainly indents).

So what my trials boil down to is the following proposition:

Keep separate websites: user-facing (Sphinx) and developer (doxygen)
Document core capabilities in C++ docstrings (as is still the norm with older parts of Cantera), and make these detailed "Science" docs easier to find
Move documentation that most users don't need but is currently hard to find to doxygen (main example: YAML Input File Reference)
Cross-link where appropriate and remove redundant documentation (high level science explanations stay in Sphinx, while detailed documentation remains in doxygen)

This proposal probably deviates a little from current thinking, but I honestly believe that it simplifies the work flow and is overall much easier to maintain: documentation is written together with the code in the C++ header files, which takes care of most concerns voiced at the top of this issue report. Everything can be checked using scons doxygen.

speth commented 11 months ago

The division between what's handled by Sphinx and what's handled by Doxygen within the docs that are built from the contents of the main repo is not really what I was trying to address with this proposal, and again, I think that's already what #115 is about. What I'm suggesting here is to migrate the installation instructions, tutorials, and science documentation, all of which are version specific, into the main repo. Some of this may go into the Sphinx docs, and other parts into the Doxygen docs. I would agree that there is some value in keeping the detailed "science" docs as close to the implementation as possible, in the form of class docstrings on the various models.

ischoegl commented 11 months ago

Thanks for the comments, @speth.

I would agree that there is some value in keeping the detailed "science" docs as close to the implementation as possible, in the form of class docstrings on the various models.

I believe this by itself addresses a lot of the issues you linked at the top. It is ultimately irrelevant whether this is rendered by doxygen or Sphinx (i.e. #115) - presumably, either of the two approaches should be able to generate detailed documentation.

bryanwweber commented 11 months ago

I would agree that there is some value in keeping the detailed "science" docs as close to the implementation as possible

I don't think I agree with this. I think there's a really important case of arranging the content in more of a narrative format to tie concepts together for teaching. I don't think class docstrings facilitate that mode. I'm also loath to duplicate content.

ischoegl commented 11 months ago

I don't think I agree with this. I think there's a really important case of arranging the content in more of a narrative format to tie concepts together for teaching. I don't think class docstrings facilitate that mode.

While I see where you're going at, I believe it's easy to get lost in the details if the narrative is too comprehensive (apart from my reservations about a convoluted work flow). My estimate is that 90% (or more) of the user base are only interested in a quick overview, and as long as we provide links to details (some of which are extremely lengthy, see this example), we should be good?

I'm also loath to duplicate content.

Agreed. At the moment, most of the Science details are still in doxygen docstrings. Changing this is a bear.

PS: as an aside, putting the detailed docstring description on top of doxygen pages as proposed in Cantera/cantera#1546 makes things a lot more intuitive

speth commented 11 months ago

I would agree that there is some value in keeping the detailed "science" docs as close to the implementation as possible, in the form of class docstrings on the various models.

I believe this by itself addresses a lot of the issues you linked at the top.

While this would address a lot of the (capital-I) Issues I linked to, it's only a small part of the version-specific updates in question. You can get a better feel for the scope of such changes in https://github.com/Cantera/cantera-website/pull/248/files.

It is ultimately irrelevant whether this is rendered by doxygen or Sphinx (i.e. #115) - presumably, either of the two approaches should be able to generate detailed documentation.

Agreed.

I would agree that there is some value in keeping the detailed "science" docs as close to the implementation as possible

I don't think I agree with this. I think there's a really important case of arranging the content in more of a narrative format to tie concepts together for teaching. I don't think class docstrings facilitate that mode. I'm also loath to duplicate content.

I said some value, not that this was definitively the best option. To elaborate, one thing that is a bit of a struggle is to even know what all the things that have to be updated are when making an implementation change, when those changes are spread across many files (or worse, as now, repos). At least if the basic equations a model implements are right there with the corresponding class, it's obvious what someone should do when they've implemented a new class / method. We can try to remember all of these bits and pieces at PR time, but anything that makes this more obvious before that is useful. I agree that you can write better narrative documentation if you're not tied to the structure of the implementation, but it's also a lot harder to get anyone to actually do that. I'd say that the current state of #6 is proof of that.

ischoegl commented 11 months ago

I agree that you can write better narrative documentation if you're not tied to the structure of the implementation, but it's also a lot harder to get anyone to actually do that.

I couldn't agree more. Once a PR is approved, all the incentives for further work are gone; requiring decent docstrings is comparatively easy.

Regarding version-specific stuff in Cantera/cantera-website#248, I appreciate the work! I'm definitely supportive of off-loading as much as we can to the main repo.

bryanwweber commented 11 months ago

Agree with both of you guys 😄

ischoegl commented 11 months ago

Fwiw, I took a deep dive into doxygen markdown, see Cantera/cantera#1548. The PR would consolidate the YAML Format Reference in the main repository (there are about 1k lines to be deleted from cantera-website, with quite a few redundancies removed and the browsing experience overall - I believe - considerably improved).

Moving the YAML format reference to the Developer API (or "advanced"?) documentation is imho consistent, as most users don't need to know how to assemble YAML input from scratch (also, after the removal of CTI, input is no longer handled by Python). As MyST is just another flavor of markdown, I believe this effort to be portable within the context of #115 (obviously, with limitations).

speth commented 11 months ago

So, if we're agreed that some of the content that is currently in the website repo should be moved to the into the main repo, the question is then whether any given piece of it should be part of the Sphinx docs in the main repo, which is currently used for the Python, Matlab, and YAML API documentation, or in Doxygen, which is currently used for the C++ API documentation.

While @ischoegl has already forged ahead on moving some content into Doxygen (Cantera/cantera#1548), my thinking had been to rely more heavily on Sphinx, due to some the features it provides (with the appropriate extensions). For starters:

The version switcher dropdown that comes with the Pydata Sphinx theme
The ability to generate a consolidated references list, possibly using either sphinxcontrib-bibtex or the JupyterBook citation functionality. I guess Doxygen also has a \cite function, but we've never used it.
The ability to link into Doxygen documentation using sphinxcontrib-doxylink, while Doxygen has no clean way to link into Sphinx (as far as I know). I don't think we can (or should) try to use Doxygen to build the Python API docs, which means we will always need to have some links into Sphinx. The more content that we have that lives in Sphinx, the more of this that can be done using native cross references, and less using raw relative HTML links.

I don't necessarily think we should go so far as to use Sphinx to generate the HTML for the C++ API documentation, but I think it is a better tool for most of the rest of our documentation needs.

bryanwweber commented 11 months ago

the JupyterBook citation functionality.

AFAIK this is just sphinxcontrib-bibtex under the hood.

ischoegl commented 11 months ago

Thanks for the continued discussion. From my perspective, I want to clarify that moving everything into Doxygen was never a goal; both Sphinx and Doxygen are fantastic tools (the latter surprisingly so, as the default looks like the 90's want their GUI back). They both have strengths and weaknesses. I am not impressed by the output of breathe, and I was not able to get Doxygen to process Cython documentation. From that perspective, I agree that sphinxcontrib-doxylink will be the way to go.

I don't necessarily think we should go so far as to use Sphinx to generate the HTML for the C++ API documentation, but I think it is a better tool for most of the rest of our documentation needs.

We are in agreement on the first part; on the second part I believe Doxygen deserves more credit, as we were not using it effectively.

As mentioned in my comment above, my suggestion would be to:

Keep separate websites: user-facing (Sphinx) and developer (doxygen)

Document core capabilities in C++ docstrings (as is still the norm with older parts of Cantera), and make these detailed "Science" docs easier to find

Move documentation that most users don't need but is currently hard to find to doxygen (main example: YAML Input File Reference)

Cross-link where appropriate and remove redundant documentation (high level science explanations stay in Sphinx, while detailed documentation remains in doxygen)

I believe that points (1) and (3) are critical:

My estimate would be that 90-95% of users are interested in using Cantera from the Python or MATLAB front ends, and are mostly interested in examples that can help them to solve standard research problems (or class assignments), which only require YAML that are either available or can be converted via ck2yaml. They should use available installers (conda, pip, etc.). Obviously they need quality documentation of the interfaces, and that's where Sphinx shines.
There is likely only a small number of people who are interested in how things work "under the hood" (and who will compile Cantera from source). What I realized when looking over the existing doxygen documentation is that the detailed documentation is quite good, but attempts to organize content were quite disjointed, leaving the impression that there wasn't much there. Doxygen is significantly better than it appears to be, and relatively large improvements can be achieved with relatively little effort.
We need to work on ways to allow users to move from the 90% to the 10%. I believe any efforts to improve the user experience for the documentation of actual Cantera source code should be helpful. I don't think that it matters much if Sphinx/Doxygen generated content may have a slightly different look.
Now about the YAML format reference. There are probably slightly more people needing this information than those interested in C++ source code, but I also believe that the coupling between YAML format and C++ source is much stronger than YAML and user-facing Python/MATLAB interfaces, meaning that the need to link between the former is greater than for the latter. Currently we don't really provide those links, which is unhelpful when trying to find actual implementations. This is ultimately the reason why I am suggesting to move the YAML reference to Doxygen.

Given the limited manpower of the Cantera project, I am hoping to create simple workflows with reduced maintenance overhead (which happens to be an objective of this issue report).

Regarding the other points raised: some Doxygen run projects do have switchers (example: OpenCV), and the \cite command looks simple enough. For a lot of the documentation, it doesn't matter whether it's rendered by Doxygen or Sphinx. I would simply propose to move anything that is considered 'advanced' (and doesn’t involve Python or MATLAB) to Doxygen.

ischoegl commented 11 months ago

Fwiw, creating doxylink cross-links from Sphinx to sections defined in Doxygen markdown works perfectly; just tested this in Cantera/cantera#1548

ischoegl commented 10 months ago

With work on #179 being completed, I believe that doxygen now presents a viable option to receive some content from the website.

ischoegl commented 9 months ago

@speth and @bryanwweber From my perspective, I think this is a probably one of the highest priority issues for 3.1.

If we manage to move major parts of the website to the main repo and put things under CI, my personal hope is that the hurdle for releases will be lessened considerably (at least this was the premise of the description at the top). One question I have for @bryanwweber is about MyST - presumably, we can enable this on the main repo before the transition of the main website is done?

speth commented 9 months ago

Yes, completing this is my main goal for 3.1.0. Partly because I think any partial implementation would make that next release complicated, but also because I think it will provide a pathway to resolving many of the documentation deficiencies that have been noted.

Enabling MyST usage is trivial -- I've already tried it very briefly. All you need is to add the myst_parser extension and then set source_suffix = ['.rst', '.md'] in conf.py. I think we'll end up with a mix of rST and MyST content for some time. For one, because autodoc currently parses and generates rST only (see https://myst-parser.readthedocs.io/en/v0.17.1/sphinx/use.html#use-sphinx-ext-autodoc-in-markdown-files). If that ever changes, we could update, but it would require reformatting all the Python docstrings at a minimum. In any case, we can start using it for new content and content moved over from cantera-website, which is where I think it will be most useful anyway.

I'm currently working on modifying sphinx-gallery to support rendering examples in languages other than Python so we can replace the homegrown example gallery generator we use in Nikola. You can see the current progress on this in sphinx-gallery/sphinx-gallery#1192.

speth commented 9 months ago

Also, I've been thinking about how to organize the content of the website, and looking at how this is done in a number of other peer projects, such as Matplotlib, Pandas, SunPy, and others. My current draft outline is:

Install
User Guide
- Tutorials (introductory)
- Task/goal-oriented guides
  - Jupyterbook style
  - perhaps take some of our Jupyter notebook examples and previous workshop material as a starting point
  - Not quite sure how to organize this w.r.t. different interface languages
  - "converting input files" guides clearly fit into this category
Examples
- Examples for all languages
- Input file examples?
- filterable by language, topic, and Cantera features used
- Starting point is sphinx-gallery, with some further extensions
Reference
- API documentation for each language interface
- Input file documentation; I suspect most of the "defining phases" content goes here? Maybe some of it should be converted to a
- Science (how this is going to be organized is a good question)
- Changelogs
Develop
- Instructions for compiling from source
- Explanations of how the code works, e.g. how the Cython module and C++ are connected, how ReactionRate / ReactionData / MultiRate works, etc.
- Explanations of what has to be done to introduce new models
- Lots of new content required here, but this is clearly needed if we're going to be able to help new contributors get started.
Community
- Code of Conduct
- Acknowledging Cantera
- Getting help (bug reporting, groups)
- Governance (Steering Committee / NumFOCUS)
- Donations

The top-level headings would be the ones appearing in the site header. Compared to the current site, this means combining "Science" and "Documentation" into "Reference", Replacing "Tutorials" with "User Guide", adding "Develop", and dropping "Blog".

I'm very interested in any feedback on this layout, and getting to some consensus before we start implementing anything significant.

ischoegl commented 9 months ago

@speth Thank you for elaborating. I like the structure you suggested, and am fully on board with making this as painless to maintain as possible.

In terms of execution, my preferred approach is to be as pragmatic as possible (as little customization as possible). The two tools we have at our disposal are Sphinx/pyData and doxygen/awesome-css, where the styling is thankfully relatively similar. My deep dive into doxygen left me with the impression that it's very good to resolve some of the details, while Sphinx is definitely the better approach for 'big-picture' documentation.

Here are some of my thoughts:

User Guide/Tutorials: For Python, a lot of this would be best implemented with MyST documents; I believe it's possible to execute code in the compilation step. Ideally, we should move the entire content of cantera-jupyter here (or split things between this section and examples). I think it's ok to be somewhat Python-centric, as it's the main interface, and the new MATLAB interface will be fairly close.
Examples: :+1: on your suggestion; for C++ (and perhaps Fortran), there is the alternative to move examples to development and simply redirect
Reference: :+1: for API (this is obvious) and input file documentation is in decent shape (although the tutorial needs to be merged and cross-references to C++ implementation inserted). For the Science section, I have the somewhat radical preference of moving everything to C++ docstrings, and let doxygen handle this. The main reason for this is that it will be easiest to maintain long-term, and remove redundancies.
Develop: my preference would be to just redirect to doxygen, and have this be the interface for power-users (this includes compilation from source)
Community: :+1: on this also

bryanwweber commented 9 months ago

@speth Re

Starting point is sphinx-gallery, with some further extensions

I had been working on https://github.com/bryanwweber/sphinx-gooey to support this need because I didn't find what I wanted in sphinx-gallery, especially other languages as you're fixing. Certainly there's a big advantage in using a pre-existing project instead of building and supporting our own. I wonder if you have any thoughts about maintaining code similar to that from our Nikola site, that is, sphinx-gooey?

This has been the biggest blocker to me moving on with the website style changes and switching to MyST more broadly, so I'm super interested to get it resolved.

ischoegl commented 9 months ago

Certainly there's a big advantage in using a pre-existing project instead of building and supporting our own. […] This has been the biggest blocker to me moving on with the website style changes and switching to MyST more broadly, so I'm super interested to get it resolved.

My own 2 cents in this context are that the extra effort of creating/maintaining infrastructure code requires continuous input. My impression is that @bryanwweber genuinely likes this work and I’m confident that any solution will look amazing; but I am also aware that there are bandwidth issues. As mentioned above, my personal approach is informed by pragmatism. I don’t think it’s necessary to have a 100% visually consistent solution if we can piggyback on other projects. Of course it’d be great if we can afford the manpower for a custom solution. Whether having the bandwidth is a realistic assumption is the real question here.

bryanwweber commented 9 months ago

100% visually consistent solution

In this case, I should clarify that I meant that I didn't like the source file layout and output structure that was enforced by sphinx gallery, because I didn't think it'd match well with how the website is laid out (again, both source and output). However, if the structure we're envisioning for the website is changing anyways, it may make sense to reshape that to fit the expectation of sphinx gallery.

ischoegl commented 9 months ago

However, if the structure we're envisioning for the website is changing anyways, it may make sense to reshape that to fit the expectation of sphinx gallery.

@speth mentioned that his suggested layout is informed by other projects. From that perspective, I think that things can be made consistent. The main concern is how to deal with MATLAB examples.

speth commented 9 months ago

I think @ischoegl's idea about having some components of the documentation besides just the C++ API docs handled by Doxygen is an interesting one. It poses some integration challenges, though, in that mixing internal and external links in the navigation structure (e.g. in the menu bar across the top of the page) is a bit tricky in both Sphinx and Doxygen. One way to resolve that challenge by using doxysphinx to integrate the Doxygen-generated HTML content into the Sphinx navigational structure.

I've started trying this out, with a work-in-progress branch here: https://github.com/speth/cantera/tree/doxysphinx-trial, and a copy of the resulting docs here: https://cantera.org/~speth/doxysphinx-trial/reference.html (follow the link into the C++ documentation). I'd say this worked pretty well without too many modifications. The one bit of a hack I did introduce was to add some toctree directives to the pages generated by Doxysphinx for Doxygen "modules" so those would be added to the left navbar structure. Otherwise, the toctree's it generates are pretty minimal and only cover the generic elements of the Doxygen output.

My observations on this approach so far:

Plusses
- Provides consistent navigation through all site components, without having to use Doxygen to provide links back to the Sphinx site.
- Gives us an integrated version switcher for the C++ API docs.
- Should make it easier to deal with having content split between Doxygen and Sphinx based mainly on how we want it organized, not dictated by the tool used to process it.
Minuses / issues
- Search results aren't great (see for example https://cantera.org/~speth/doxysphinx-trial/search.html?q=IdealGasPhase). A lot of the results are into various index pages, and the "preview" text for the most relevant links is mostly unreadable. I think there are ways to improve this, but it will require either more post-processing or additions to doxysphinx.
- doxysphinx requires switching away from the CREATE_SUBDIRS option to Doxygen. Which would be fine except that we opted into that option to deal with clashes between a couple files in the flat output option, e.g. kinetics.h and kinetics/Kinetics.h and I don't quite know what the resolution of that is.
- Extra processing steps and a significantly longer build (and rebuild) time for the docs. Doxysphinx itself is quite fast, but this results in an order of magnitude more pages for Sphinx to go through than it does currently, which takes a while.
- There are some CSS issues that we will have to deal with. For example, dark mode doesn't work well on the Doxygen pages.

My next step is to spend a bit of time on the alternate approach, where we would just be linking between Doxygen- and Sphinx-generated content so we can compare them before committing to one approach or the other.

speth commented 9 months ago

And here's the second option, which keeps the Doxygen and Sphinx HTML generation more separate, but with some improvements to navigating between them: https://cantera.org/~speth/doc-linking-trial/reference.html. This is built off of the branch https://github.com/speth/cantera/tree/doc-linking-trial.

The main change here is to replace the Top-of-page navigation links in Doxygen with ones that match Sphinx. Navigation within the Doxygen docs is handled by the navigation area on the left (unless you're browser window is too narrow; we may want to find a fix for this, but hopefully most people aren't trying to read C++ API docs on their phones...).

On the Sphinx side, I added a set of cards to the root "Reference" page instead of using a (visible) TOC. It is unfortunately impossible to add relative links to external pages to the Sphinx TOC system (i.e., to get something to appear in the left navigation area; this is despite significant demand; see https://github.com/sphinx-doc/sphinx/issues/701).

Observations:

Plusses:
- This is only a minor change compared to what we're doing now, and doesn't introduce any new dependencies or steps that need to be integrated with SCons, and doesn't increase incremental build times.
- With only a handful of CSS rules, the styling consistency between the PyData Sphinx theme and Doxygen Awesome is pretty good.
Minuses:
- No links to Doxygen content in any Sphinx TOC.
- While the Sphinx search results page isn't perfect, it's at least a full text index, unlike Doxygen. For example, search for "partial molar entropies" and the latter turns up no results, despite this phrase occurring on many Doxygen pages.
- The implementation of the Doxygen top navbar is a bit of a hack, and the styling really only works as long as all the Doxygen content fits under a single heading (e.g. "Reference") since highlighting this section as "active" is just hard-coded.

While I'm quite impressed by what Doxysphinx manages to do, my inclination is to opt for this simpler approach, since I think it still works well while avoiding introduction of extra moving parts in the documentation machinery that could end up requiring significant effort to maintain in the future.

speth commented 9 months ago

I've made a few further updates to this second version (https://cantera.org/~speth/doc-linking-trial/reference.html), to flatten the layout (along the lines suggested in Cantera/cantera-website#229), to provide stubs for the other main sections, and to try an example of adding some of the "science" documentation to the reference section using the MyST format.

Based on this test, I recognized that there is a significant advantage to putting the science documentation in MyST or other markdown files rather than keeping it in C++ docstrings. Namely, that you get all the editor capabilities that are available for standalone markdown files, like syntax highlighting and (in VS Code at least) live preview.

With the flatter file layout, it's also not too difficult to make a relative link from the C++ documentation to a specific page in Sphinx, for example [`lattice` Phase Model](../reference/science/phasethermo/lattice.html).

Unless there are any strong opinions to the contrary, I'm going to start migrating some of the existing content into this new structure.

ischoegl commented 9 months ago

@speth ... thank you for some impressive work on this. I am in general :+1: with the proposed direction, and the prototype website you linked to looks great.

Apart from being interested in @bryanwweber's input, the one concern I have is how to review. I am a little hesitant to merge Cantera/cantera#1621 until the upstream sphinx-gallery changes are merged (I am cautiously optimistic, but the review is starting to take a long time); I'm less concerned about sphinx-tags (the maintainer definitely isn't responsive at the moment, but it's also a significantly smaller effort). At the same time, I don't want a backlog to build up, so I'd be amenable to proceeding as long as we have consensus.

Aside: I haven't looked into MyST recently, but my recollections from whenever I spent some time with it are extremely positive. It definitely is the way to go.

Going back to how to proceed, I'd suggest the following:

Decide/merge on examples (I am in favor with the caveat stated above)
Decide/merge C++ reference updates (I am in agreement with @speth's assessment about doxysphinx: impressive, but may require maintenance overhead)
Work on Science transition (I am likewise in favor, although I do have a friendly amendment to keep things simple, and use doxylink to existing C++ implementations/docstrings so users have a better idea of how things are implemented)

PS: Regarding the example [`lattice` Phase Model](../reference/science/phasethermo/lattice.html), I am not convinced that it needs to live in MyST, but it's a fine example to illustrate at the moment. For implementation details like these, I'm not sure that it is worth the churn - it's a massive effort for comparably little benefit, as we could just link to existing docstrings.

Ultimately, we just need to provide/cross-link information on what models are available, how they are implemented, and how things are defined in the YAML input format. I am more concerned about creating this linkage (see #182) and less about moving existing things around (beyond moving source material from the website as proposed in this PR).

speth commented 9 months ago

To facilitate review of this work, I opened https://github.com/speth/cantera/pull/6, which shows the delta between Cantera/cantera#1621 and my doc-linking-trial branch.

You're correct that the current migration of the LatticePhase model description into MyST was mostly for illustration. I picked it somewhat at random as a page that had quite a bit of math, and I wanted to see whether MyST was an improvement (I do like the dollarmath extension and being able to write just $Y_k$ instead of @f$ Y_k @f$). And I agree that continuing on this particular aspect would be a significant expenditure of effort on a task that is not required for completing this content reorganization. So, in the cases where there is existing documentation in a C++ header file, just linking to that makes sense to me, at least for the time being.

On the other hand, moving all the content from cantera-website into this site is on the critical path, and for that, I do want to integrate that as MyST pages rendered by Sphinx, rather than converting say https://cantera.org/science/reactors/controlreactor.html into the docstring for the Reactor class.

Beyond the "science" docs, there are also the installation and compilation instructions, plus the pages that are currently lumped under Tutorials that need to be migrated from cantera-website. In part what I'm looking for is a way to keep working on this and not lose momentum.

ischoegl commented 9 months ago

Perfect - thanks!

If @bryanwweber is ok with it, I'd be :+1: with merging Cantera/cantera#1621 so we don't have to do this the roundabout way of PR's against forks. It is not ideal to pull from a fork for CI, but I'm fairly optimistic about sphinx-gallery, and I get the idea of not wanting to lose momentum. If there isn't any movement on sphinx-tags, I'd suggest to create a (hopefully temporary) Cantera/sphinx-tags fork. Regarding my review, I'd say let's just settle the remaining differences and merge.

I'm aware of the tutorials and other instructions. Based on my trial with the YAML tutorial, I believe they may be easier to transfer than the Science section.

bryanwweber commented 9 months ago

I'm feeling very much out of the loop, so I want to take myself out of the critical path here. Whatever you guys think is best, let me know how I can help!

Cantera / enhancements

Reorganize website source material to simplify updating #178

115 probably deserves an honorary mention here (this is mostly about motivation and visual consistency; I quite like the proposed alternative, but I'm also not opposed to "breathe" as the linking portion does need to be resolved somehow).