Dynamic configuration / environment variables / etc with book builds

executablebooks / jupyter-book

Create beautiful, publication-quality books and documents from computational content.

http://jupyterbook.org

BSD 3-Clause "New" or "Revised" License

3.76k stars 652 forks source link

Dynamic configuration / environment variables / etc with book builds #1673

Open choldgraf opened 2 years ago

choldgraf commented 2 years ago

Describe the problem/need and solution

Currently we use a static configuration file (_config.yml) for all of the book's configuration. However, there are some cases where you want to dynamically choose configuration at build time. For example, "set a configuration value based on an environment variable."

This isn't currently possible with static configuration, but it is possible in Sphinx. We could find some way to allow a user to dynamically update their configuration (or run arbitrary Python code) at build time.

Guide for implementation

Current build process

Here's where we invoke Sphinx:

https://github.com/executablebooks/jupyter-book/blob/aedee257645ee41906c4d64f66f71b7f0dc7acfa/jupyter_book/cli/main.py#L307-L321

In that case, we explicitly set noconfig=True, which means that Sphinx does not expect any conf.py file to exist.

We then generate a dictionary of Sphinx config, and pass it to the Sphinx build command as "overrides":

https://github.com/executablebooks/jupyter-book/blob/aedee257645ee41906c4d64f66f71b7f0dc7acfa/jupyter_book/sphinx.py#L114-L129

We also already have the ability to generate a conf.py file from a _config.yml file:

https://github.com/executablebooks/jupyter-book/blob/aedee257645ee41906c4d64f66f71b7f0dc7acfa/jupyter_book/cli/main.py#L458

Three ideas for implementation

There a few ways we could add this functionality:

Support conf.py. We could allow users to add a conf.py (maybe we'd call it _config.py?) that we'd point to during the Sphinx build. This would behave no differently from how Sphinx currently handles it.
Generate a conf.py at build time, and add a extraConfig block. Instead of using configuration over-rides, we could generate a temporary conf.py file that was created via the function above. We could then support a configuration block that would contain arbitrary Python code to be run, and that could over-ride / set configuration values (by being added to the end of the conf.py file. This is similar to how JupyterHub uses extraConfig.
Pre-process config.yml with jinja. We could also add a pre-processing step before we parse the config.yml file. This would let users to something like ansible style variable injection.

Suggestion

After some discussion below, it seems like path 3 above has the most support for adding this functionality. Especially if we followed patterns that were already common in other frameworks, it would be a way to provide some dynamic configuration without supporting the total flexibility of a conf.py file.

Tasks and updates

No response

choldgraf commented 2 years ago

cc @fperez who is hitting a hard-blocker on his course w/ the lack of this feature, maybe he has thoughts?

fperez commented 2 years ago

Thanks a lot @choldgraf for opening this one! Indeed, for me the use case is that when using JBook on a hosted hub, I need to modify the sphinx build to use a different base URL so the Jupyter URL proxy can work. With pure sphinx that's easy enough - I just add a block fenced by if "JUPYTERHUB_SERVICE_PREFIX" in os.environ: and it all works fine, both when running on the hub and later with things like automated github action builds.

The problem is, I can't put that if statement in _conf.yml. Sphinx is design around imperative solutions, so the declarative model of jbook (which has advantages, this is not criticism :) makes it hard to find a workaround in this case.

I don't know the internals enough, but I tend to lean towards solution 1 above... It might be better to perhaps make it a bit more explicit, by requiring that users declare a config file key in their sphinx section, which, if present, would then make jbook ignore all other keys and defer to that file. Something like:

sphinx:
  config_file : conf.py  # if this key is there, all others are ignored
  key : value...

That would make it easier for users to explicitly know that they are overriding the "simple mode" of key/value pairs in the sphinx section, and to toggle between the two by commenting out that key if they want to ignore that.

I have no idea if this is otherwise complicated given jbook's internals, I don't know how the information flows to sphinx. But from a user perspective, this seems like an easy-to-reason-about option.

chrisjsewell commented 2 years ago

At option that I would propose, specifically for different configurations for different build environments etc, is an extends keyword, similar to https://www.typescriptlang.org/tsconfig#extends

Then you could have a "base" _config.yml and additional _config.env1.yml configs for use as jupyter-book build --config _config.env1.yml

choldgraf commented 2 years ago

I see your points, though I don't think it is so black and white. In my opinion, many users would benefit from having a safety valve to add extra functionality on top of the basic configuration Jupyter Book provides. Especially since so much of the broader Sphinx extension ecosystem depends on configuration via Python (e.g. sphinxcontrib-bibtex as reported in #1090 and I believe @psychemedia ran into a challenge like this around defining a custom admonition).

For example, if we:

Made it technically possible to include a conf.py in the build (e.g. by --config-python path/to/conf.py)
Documented this in an advanced section, with the warnings that this is a power-user feature with no promises for stability and functionality.

Then we minimize our risk of maintenance complexity and confusion, while helping users that would like a bit more flexibility than a fully declarative config, without booting them out of the Jupyter Book toolchain entirely. Over time, if there are common patterns people follow with conf.py, then those are good cases where we build declarative configuration around it.

On the other hand, I worry that if we follow a strict purity test that everything must be declarative, we must shoulder all of that complexity ourselves, which creates extra maintenance burden for the maintainers (and we are already stretched far too thin, IMO).

My hope is that saying "here's an extension point to have the full flexibility of Sphinx, but without leaving Jupyter Book, and you use this at your own risk" could be a nice balance.

What is the main downside that you're worried about? Is it maintenance complexity? Dilution of the "MyST brand"?

(I do like the nested config idea though)

chrisjsewell commented 2 years ago

Then we minimize our risk of maintenance complexity and confusion

Honestly, I feel exactly the opposite; introducing multiple types of configuration is precisely going to lead to confusion and complexity and also makes it incredibly difficult for third-parties to "reason" about a jupyter-book. For example, Readthedocs already has to have hacks, to deal with conf.py, and it is somewhat similar to Python moving from setup.py to pyproject.toml

As a third-party (with curvenote) I would be interested to hear what @rowanc1 thinks?

rowanc1 commented 2 years ago

From a Curvenote perspective, I certainly like the declarative pieces -- mostly because we aren't running sphinx, but are reproducing much of the behaviour and can read from/understand config files very easily. I don't think that should have too much sway here though, other than thinking about future migrations / new versions of JupyterBook that (maybe) don't use sphinx to get better control over rendering, interactivity and performance.

My user perspective: wonder if there is a middle ground of adding something like env readers: ansible has some ideas on syntax.

baseUrl: "{{ lookup('env', 'JUPYTERHUB_SERVICE_PREFIX') }}"

This seems to keep the surface area small, but might be enough flexibility for the use cases above?!

Some of my other ecosystem experience: react-scripts has an "eject" function, which is similar to what @chrisjsewell described. Push the button, and you are on your own and have all of the power of python/sphinx. :) I think there are similar tensions in that ecosystem of having a happy path, and if you want very advanced things you are on your own. That being said, env variable lookups probably aren't "advanced"?!

chrisjsewell commented 2 years ago

Cheers @rowanc1

ansible has some ideas on syntax.

an ansible type declaration (which essentially uses jinja) indeed could be a possibility (I have quite a bit of experience with ansible 😅 https://github.com/marvel-nccr/quantum-mobile)

psychemedia commented 2 years ago

I don't recall any particular issues adding a simple custom admonition for Jupyter Book, it was just a very simple py package/extension. The issues I have a more to do with being able to provide a consistent user experience in terms of notebook content stye across Jupyter Book and and JupyterLab notebook/RetroLab UIs. I'm all for opening up ways of allowing have-a-go end-user developers to tune things, but not at the risk of complicating the new-user experience. When I look in a repo I want to crib from, the more config and settings files there are, the less likely I am to be able to make sense of them or know where to look to see how a particular effect was achieved.

I think I'm probably more with @chrisjsewell on this in terms of seeing Jupyter Book as a distribution optimised for use with MyST source documents and a set of extensions that work together to render a particular flavour of interactive book using a relatively streamlined setup procedure.

Providing tools that can "export to a sphinx config" seems sensible. That could be used to bootstrap someone's own Sphinx build environment, would support innovation in that more general context, and that innovation might then feed back feature suggestions to Jupyter Book based on proven ideas in the wider context.

fperez commented 2 years ago

A few notes based on the above feedback:

On declarative vs imperative systems I'm a big fan also of declarative systems, but they have by construction a tough challenge with dynamical information, lacking as they do conditional logic. In practice this can range from a mild to a serious issue, depending on the use cases. For things like build systems (that have massive amounts of dynamic decisions to make) typically some escape hatch, two-stage process, or specialized syntax, become necessary. A good example to consider from the Python world is how PEP 508 has to define a fairly complex grammar so that expressions that contain conditional logic can be used, such as:

requests [security,tests] >= 2.8.1, == 2.8.* ; python_version < "2.7"

So it's not enough to say "this runs against all we are doing with declarative systems", actual solutions to the problems users are trying to solve need to be offered by the system. Doing a fully declarative system like what pep 508 did is quite tricky (it's technically delicate, and you need to be really sure that you nailed your problem space with your grammar). Hatches that escape out to more imperative options in limited scenarios are often used for this reason: they are basically a way to tell the user "you've reached the end of the simpler system, use this more open-ended tool but at your own risk."

Use cases: as for "jupyter book is for simple use cases, beyond that use something else", I really hope the project team will consider a different tack in terms of listening to user requests. In other parts of Jupyter we have always tried to "make the easy things easy and the hard ones possible", as John Hunter used to say of matplotlib. The problem with a tool that tells users that once they hit a limitation they should simply go elsewhere, is that it's basically telling them not to invest in the tool: you never know when you'll find a limit. My reason for needing this isn't a particularly esoteric or advanced situation, I simply want to be able to build a jbook site both in a hosted JupyterHub and in a github pages deployment powered by GH Actions. That's a pretty vanilla use case: edit/test interactively in JHub, push to github when ready and let Actions/Pages take care of the publishing. But doing so requires URL rewriting, hence the above request.

Now, obviously every tool does have limitations, and sometimes it's really not possible to meet a certain need and users should look elsewhere, that's fair. But in this case there was barely any consideration of the scenario, need and possible solutions before a strong "if you don't like it, don't use it" message was provided.

A general comment: I hope the project team will see user requests as opportunities to explore a problem space and improve the tool. Responding immediately to a user need with "very very strong opposition" only achieves driving users away, and creating a climate in the project that is hostile to discussion, exploration and creative thinking.

psychemedia commented 2 years ago

@fperez Re: ease of use and accommodating users, I am all for that, particularly in terms of supporting relatively straightforward end-user co-option and customisation. I am just wary that complexity can take many forms: in end-user experience, in config / default settings, in the codebase, in the development environment.

There is always a risk that trying to simplify end-user experience through more complex config and code pushes users further way from engaging as anything but end users, and reduces opportunities for simplistic end-user development. (That's not necessarily the case here, just an observation I've made from watching a lot of code based tool projects evolve over many years.)

One of the concerns I have about Jupyter Book is the drift in terms of end-use experience between the ever-richer look and feel of content a Jupyter HTML book and the experience in a notebook authoring environment. It is getting to the point where you need to have two windows open - one for "raw" notebook editing, and one for a rich preview of HTML output, which is similar to workflows in RStudio, (though I think rich editing is one of the things the RStudio folk have been trying to improve in Quarto.).

The notebook UI is drifting further from supporting a toggle between rendered and source views, and is increasingly just a source view (this is partly why I am interested in moving style directives into cell tags). This limits reach, particularly for authors who still refuse to leave MS Word and the limited opportunities that it provides for rich, interactive authoring.

I think the VS Code experience is still focussing on developers rather than authors, and it will be interesting to see how far Curvenote goes in supporting a customisable editing environment that can provide a WYSIWYG environment that matches ever richer Jupyter Book outputs.

Having tools that support easy publishing, for example through Github Pages, is key. But it would be a tragedy if you had to be a sysadmin to be able to config Jupyter Book, rather than being able to just run it off-the-shelf, and you didn't stand a chance of being able to contribute your own end-user extensions to it without being a professional developer.

SimplyOm commented 1 year ago

I read found this discussion and read through it as I was not sure why noconfig=True has been hard-coded as such. While I understand the concern about having multiple config files leading to confusion, I don't see how disabling it completely is very useful.

For my use case, I want to override some of the docutils configs via docutils.conf (as described here). I am not even sure if pre-processing config.yml with jinja can ever achieve that.

I appreciate the discussion here for the most strategic way forward, but until then, why not allow an optional parameter in config.yml to look for any additional config (for example, docutils.conf)? So, we would have multiple levels of safety check:

For someone who doesn't care about it, they wouldn't find that optional parameter.
Even if someone turns the optional parameter on leading to noconfig=False, it would be no-op. Unless there actually exists any conf.py or docutils.conf in the folder -- in which case, they would mostly know what they are doing.

Open for thoughts on this.

agoose77 commented 1 year ago

As an aside - any user can have programatic access to the Sphinx configuration by creating a local sphinx extension.

This is done with two steps:

Create the extension e.g. my_extension.py

Load it with the sphinx.local_extensions setting:

sphinx:
  recursive_update: true
  local_extensions:
    my_extension: .

I use this mechanism to extend what JB does without entirely dropping the declarative configuration.

I haven't tested this w.r.t modifying a JB configuration value, but I think it should work. IIRC, the extension will see the Sphinx-friendly configuration values, rather than the JB yaml.

fperez commented 1 year ago

Unfortunately I haven't had time to pitch in further on this discussion in more depth. A few quick points:

while I agree with @psychemedia that we really need to improve the JLab experience (and efforts on the JS/TS front for MyST & friends will help a lot here), I still believe that the project could approach this kind of input/concern differently. My needs aren't hypothetical, and I presented pretty reasonable approaches to solve them. Evidently others have similar concerns too. I find it quite unfortunate to see this type of response in a Jupyter-connected project, TBH.
I do have, at least for this specific issue, an OK workaround. Here it is in the hope it helps others.

This Makefile shows how to get a build that works inside of a hub just as easily as it can work on a local installation (I just use make html-hub or make html as needed), and which also works with a standard, un-modified Book Github Action.

I hope this helps others with a similar need, given the stance above. I had to teach this workaround to all my students, but they got it working and for now it's a usable solution.

agoose77 commented 1 year ago

I can see frustrations with having two configuration files. Maybe we could follow the lead of JupyterHub, and have a configuration yaml that includes a conf.py block, e.g.

extraConfig: |
    import sys
    ...

That way we keep configuration in the same file, and have only a single file for configuration. Power users, after all, don't have to use _config.yml, but by having an imperative block within a declarative configuration helps to scope the non-declarative changes to a subset of the configuration, and leans in to the JB side of things instead of Sphinx, which is both an implementation detail and a fundamental part of how JB works (at the same time :confused: )

psychemedia commented 1 year ago

@fperez The jupyterlab-myst extension is an interesting approach in respect of rich previewing, though some Sphinx extensions may be hard to replicate unless there is a sphinx engine built into JuptyerLab and an easy way of adding sphinx config/metadata into a notebook. Whilst that might seem overkill for JupyterLab-as-a-datasci IDE, it's not hard to imagine it in a JupyterLab-rich-authoring edition for technical writers and interactive educational txt designers and users. I don't see Jupyter as a datasci thing: I see it as a general set of components and protocols, even if the current focus (eg from the core devs perspective) is in the datasci / research computing user area.

fperez commented 1 year ago

Agreed @psychemedia - the jupyterlab-myst extension is already great progress in this direction, big kudos to @agoose77 for it :) And I agree fully with the idea that Jupyter isn't just a data science tool! While its use for data science is very important, to me a key element of the project is helping people tell the stories that come out of these data. Computing is obviously at the center of it all (we're not a word-processing project :), but in many contexts, what matters is the content that comes out of the process of computing/data analysis...

This is an example of students doing data analysis work to ultimately share a complete story about air quality in the Bay Area. The repo is executable on binder and it has a JBook-powered website that has the main "narrative" with their conclusions, supported by the detailed analysis notebooks and code. We want many more such experiences to be very smooth, in particular with

minimal gaps between the live, interactive experience of using the system and the rendered result (in this sense, juptyerlab-myst is a big step forward).
a very smooth workflow between using the live system and sharing version of that for others (live website, binder, etc). There, I know that @rowanc1 has already been making progress with some of the faster/smoother JS tools his team has.

ebolyen commented 1 week ago

For anyone who runs across this issue in the context of getting the linkcode sphinx extension to work. @agoose77's recommendation for programmatic access works well, as we can assign the linkcode_resolve function to app.config just before passing through to the normal linkcode setup.

Example here (one probably doesn't want to copy our linkcode_resolve function btw, it does some janky things)