Closed jlstevens closed 4 years ago
I love the idea, as well as Jim's suggestion.
We could consider also replacing "anaconda-project-" with "" to simplify the user-written template code.
I am against replacing those options but I just realized there is not much cost involved for supporting both. I can check if something like --port
hasn't been supplied at the command line, in which case I can supply port
as an alias to the template from --anaconda-project-port
. If --port
is explicitly supplied, that would take precedence.
As a concrete example, this would mean the template value of {{port}}
would come from --port
if you ran anaconda project run command --anaconda-project-port 8888 --port 9999
(i.e it would be 9999 and you can still use {{anaconda-project-port}}
to get 8888). You could still use {{port}}
in your template if you dropped --port 9999
bit, in which case the port value would be 8888, coming from anaconda-project-port
. Does that make sense? Maybe there is a downside to this idea that I am not seeing?
I love the idea, as well as Jim's suggestion.
Great! I'll go ahead and fix the tests then (probably down to declaring jinja2
properly as a new dependency).
That approach seems fine to me, though I'd be fine with the simplest approach of simply eliminating that string when reading the arguments, such that --anaconda-project-port 6066 --port 6067
would be undefined and not recommended (precisely the same as --port 6066 --port 6067
would also be undefined). Anyway, however you want to handle that is fine.
Regarding your discussion points,
In this commit unrecognized extra arguments are also made available for templating. Right now the parsing approach is extremely simplistic: once the HTTP options are removed, the remaining arguments are expected to come in
--flag value
pairs (i.e. these becomekey:value
pairs for templating). These arbitrary flags then become the corresponding identifiers available forjinja2
templating. Unknown switches (e.g--foo
by itself) are currently not supported.
I think this is fine as a long-term solution; key:value pairs can express anything people need to control externally, and can be used to generate any type of command-line arguments a target command needs.
In future, we might want to supply the template information in alternative formats, such as a JSON dictionary.
Sure; the important thing is that the templates in the .yml file won't need to change, regardless.
The parsing of the unknown argument flags is currently very simple and does not try to guard against various types of user error. Only flags using a
--
prefix are supported. Having support only for the HTTP option switches is somewhat annoying.
I really don't mind not supporting switches; key:value is so easy to explain and understand that it seems fully sufficient, to me. I'm happy treating anything that doesn't fit into the accepted format as a clear error.
- Templating is only used when
unix
commands are used andsupports_http_options
is False. I believe that templating could remove the need for thebokeh
andnotebook
commands and remove the need forsupports_http_options
(if the template doesn't use the HTTP arguments, it doesn't matter if they are always supplied or not).
As you know, I favor porting anaconda-project's functionality into conda itself, and at that point only the templating support would be needed; the other args transformers could be left here in this project at that point.
It might be nice to show an example here of how the current notebook support could be implemented using template; I can see that it handles the Bokeh support, but am not sure what different types of magic are currently done for notebook support.
The tests are now passing. For now, I am not supporting extra arguments for templating such as --anaconda-project-foo
as I wanted to try to simplify the code (and get the tests passing) before adding new code. I still think having to have the --anaconda-project
prefix is rather awkward but it is probably worth supporting more than just the HTTP options to demonstrate the generality of the approach.
@AlbertDeFusco Looking at the Appveyor history I see you are trying to debug Appveyor too! Your last commit has the same conda.CondaMultiError: file could not be opened successfully
error that I'm seeing here. Any leads/ideas on what is going wrong here?
I don't know what's wrong with Appveyor. It could be the version of Conda
I have found a different way to install conda (using standalone conda which should also be a bit faster than using miniconda) that fixes testing on Appveyor.
All the tests should now go green. I'm not quite finished with this PR though! I need to add more unit tests of the templating in general, including tests that check that everything is working properly on Windows (I had to effectively disable the current unit test on Appveyor).
As of 43c2124, I've added all environment variables available for templating. This means that the example in this section of the docs can use jinja2 instead of bash/windows syntax. In particular:
commands:
default:
unix: "python ${PROJECT_DIR}/analyze.py"
windows: "python %PROJECT_DIR%\analyze.py"
Can now be:
commands:
default:
unix: "python {{PROJECT_DIR}}/analyze.py"
windows: "python {{PROJECT_DIR}}\analyze.py"
This avoids people from having to know about one way to inject environment variables and another way to inject template arguments from the command line.
The --anaconda-project-host
option has revealed a general problem, namely the issue with representing flags that can (optionally) be supplied multiple times.
As a concrete example, if you supply multiple --anaconda-project-host
values to anaconda-project
, these values need to be mapped to multiple --allow-websocket-origin
commands when running bokeh server.
There are several ways we can approach this, all of which have pros and cons:
Always represent options that can have multiple values as arrays e.g host
would be an array and you can use host[0]
in jinja2
to index the first item in the array etc. Pros: Consistency e.g a jinja2
loop can be used expand as many --allow-websocket-origin
flags as needed. Cons: If the common case is to have only one host
, then having to always specify host[0]
is annoying and means users need to learn concepts they may not be relevant to their common needs.
Switch to an array when multiple values are supplied e.g host
is just a string if a single value is specified, otherwise it is an array of strings. Pros: Makes the common case easy and means people only need to know that this variable may be an array version if they encounter the case where multiple hosts are specified Cons: Lacks consistency as the type of host
can change from a string to an array. A single jinja2
loop won't be able to be general and handle all cases (i.e one or more).
Have a singular and plural form of the variable where the singular form is the last element of the array and the plural form is the array e.g host
is always one host and hosts
is the list of hosts. Pros: Once you have chosen host
or hosts
the type stays consistent. Most people only need to know about host
and not hosts
. Cons: Expands the namespace of variables used for templating.
In addition to all this, there is the question of whether you could use environment variables to achieve this instead. Bokeh already supports an BOKEH_ALLOW_WS_ORIGIN
environment variable which can accept a comma separated list of hosts. The question then is whether we could turn environment variables into arrays for jinja2 templating in a general and consistent way. I don't think this is possible as there is no universal separate or convention for specifying environment variables that are arrays that I know of.
For the specific case of anaconda-project-host
, I have opted for number 3 (i.e to offer hosts
as an array as well as just host
). I think this is the best approach for this specific use case but I am not sure it would be a good thing to generalize (i.e to pluralize all variable names to offer arrays, just in case multiple values are specified).
I've now updated the documentation to describe templating. @mcg1969 @jbednar I suppose we will only want to trigger readthedocs to update the website when the next version of anaconda-project
is released? If it is possible to do a dev build of the docs first, that might also be a good idea.
I would also like to do a pass over the entire docs to fix all the references to the bokeh
command which now only works with very old versions of Bokeh and replace them with templated equivalents. Unfortunately, there are a lot of such examples so I think such a pass over the docs should be done in a separate PR.
Here is what can replace thebokeh
command using templating (using a unix
command instead):
bokeh {% for host in hosts %} --allow-websocket-origin {{host}} {% endfor %}\
{{'--port %s' % port if port is defined}}\
{{'--prefix %s' % url_prefix if url_prefix is defined}}\
{{'--use-xheaders' if use_xheaders is defined}}\
{{'--show' if no_browser is not defined}} <file-or-dir>
This should work with modern versions of Bokeh while the built in bokeh
command does not.
While this does look a little uglier, I think that overall this is fairly legible and entirely explicit. More importantly, it is easier for users to update their template in their anaconda-project.yml
files if Bokeh changes than having to update anaconda-project
itself.
Looks great to me. Verbose, but very explicit and shows exactly how you'd support any other type of server that you want to use.
@jbednar Here is most of what is needed for a template equivalent of a notebook command:
notebook {{'--port %s' % port if port is defined }}\
{{'--ip %s' % address if address is defined}}\
{{'--NotebookApp.trust_xheaders=True' if use_xheaders is defined }}\
{{'--no-browser' if no_browser is defined }}\
{{'--NotebookApp.base_url=%s' % url_prefix if url_prefix is defined}}\
--NotebookApp.default_url=/notebooks/example.ipynb
This doesn't do two things:
Unfortunately, a crazy level of ugly escaping is going to be required for the latter. I believe this should work:
notebook {{'--port %s' % port if port is defined }}\
{{'--ip %s' % address if address is defined}}\
{{'--NotebookApp.trust_xheaders=True' if use_xheaders is defined }}\
{{'--no-browser' if no_browser is defined }}\
{{'--NotebookApp.base_url=%s' % url_prefix if url_prefix is defined}}\
{{'--NotebookApp.tornado_settings={\\\\\\"headers\\\\\\"\':\'{\\\\\\"Content-Security-Policy\\\\\\"\':\'\\\\\\"frame-ancestors\\\\ self\\\\ %s\\\\\\"}}' % iframe_hosts if iframe_hosts is defined }}\
--NotebookApp.default_url=/notebooks/example.ipynb
Right now it looks like templating can get us where we need to be for processes like flask, Voila and Bokeh but additional work will be needed to get a clean templating approach for notebooks that support iframe hosts. Any suggestions?
Some ideas of my own:
url_quote
is what is currently used in the code and can be made available as a jinja2 filter. For instance, something like:
--NotebookApp.default_url=/notebooks/{{"example with spaces.ipynb" | url_quote}}
I could add a special jinja2 variable to hide all that quoted gunk which could be something like:
{{'--NotebookApp.tornado_settings=%s' % nb_iframe_hosts if nb_iframe_hosts is defined }}
Even though this would be a notebook specific variable and could change in the future (e.g if notebook supports something more sane), the earlier example shows that this is simply a shortcut to an equivalent that can be achieved with templating alone. The biggest reason not to do this is that it requires a special name to be added to the templating namespace.
@mcg1969 I think this PR is ready for a review pass when you get a chance to look at it.
As a sanity check, I can confirm that the templating (including use of environment variables) works on Windows. However, jinja2
needed manual installation which is now listed as an explicit dependency in #251.
The primary goal of this PR is to highlight some pain points in the current design of
anaconda-project
and to propose one approach that makes anaconda-project more flexible for users and easier to maintain for maintainers (with an example of solving a concrete problem).The problem
Currently anaconda-project supports three types of command:
unix
,notebook
andbokeh
. Theunix
command is the simplest of these, allowing any command to be run based on the string in theanaconda-project.yml
. If any extra arguments are supplied toanaconda-project run <command>
, these are simply appended to the command.The
bokeh
andnotebook
commands are aimed to make two particular use cases easier. These commands run servers that need extra information (e.g allowed ports) in order to work correctly across environments. Ifsupports_http_options
isTrue
, these commands convert additional arguments (namely-anaconda-project-host'
,--anaconda-project-address
,--anaconda-project-port
,--anaconda-project-url-prefix
,--anaconda-project-no-browser
,--anaconda-project-iframe-hosts
and--anaconda-project-use-xheaders
) into the appropriate format needed to run the bokeh or notebook server respectively.Unfortunately, this approach of explicitly supporting different types of tool in
anaconda-project
itself is not maintainable long term. For instance, thebokeh
command only works for fairly old versions of Bokeh (around 0.6.7?) as the command-line interface for specifying these options has since changed. Even if the_BokehArgsTransformer
class were updated accordingly to work with recent versions of Bokeh, then older versions of bokeh would no longer be supported as a result. For general support across versions, each version of each command line interface for each supported tool (i.enotebook
andbokeh
) would need to have its own unique command name for use inanaconda-project.yml
.Right now, there are several ways you can hope to tackle this problem:
Add a command for each command-line interface. This seems quite unwieldy and each time a command-line interface changes (or a new type of command is added), users need to wait for a new
anaconda-project
release to be available and installed in all contexts where it needs to be used. It is also a burden for the maintainers ofanaconda-project
to support all the various tools.Assume that the tools themselves will support the arguments supplied by
anaconda-project
. This is how support for older versions of Bokeh operated but in general it is unlikely that external projects will decide to support the necessary flags. As a concrete example, running a voila server requires the extra HTTP options, but instead of--anaconda-project-port
, the required argument is--port
. It is unlikely thatvoila
will ever support--anaconda-project-port
, directly.Use a wrapper script or a command-line trick to munge the arguments into the required format. The appeal of this approach is that users can use this workaround at any time without waiting for any software to be released. Writing a wrapper script is annoying, especially as it means including an extra file in the project to supply the extra level of indirection needed.
An example of using a special
unix
command can be seen in this project running voila server. To get this working, I had to develop a rather ugly and fragileawk
command:This command is fragile because it depends on the order of the
--anaconda-project-X
arguments and using awk like this is not user friendly. That said, this option is currently the best one as it doesn't require any changes invoila
oranaconda-project
and works whether the extra arguments are supplied or not (i.eanaconda project run dashboard
works fine locally).Proposed solution
This PR proposes that the
unix
command support a templating mechanism allowing users to control how their commands are invoked in a flexible way. Initially I considered simple Python format strings for templating but then decided thatjinja2
support would make the templating approach a lot more powerful, primarily because jinja2 supports default values. Templating would help avoid problems 1 and 2 above by formalizing a more user friendly way to take the approach in option 3.jinja2
is an obvious choice as it is probably the most common templating language used in Python (it is already used in parts of theconda
ecosystem; e.g you can usejinja2
templating inconda-build
). Using an existing templating language that is well tested and documented seems to be a much more maintainable approach than inventing a new DSL.The simplest equivalent to the
awk
command above injinja2
would be:This is not quite as good as the
awk
command in one respect: if you don't supply any arguments (e.ganaconda run dashboard
), the--port
and--Volia.ip
flags will be missing their values. This can be addressed usingjinja2
defaults. You could specify something a little more sophisticated to handle this problem:This is still a lot nicer than the
awk
command and explicit defaults are quite nice for reproducibility (e.g. this allows you to know which default settings are actually being used locally from theanaconda-project.yml
).Approach
This PR supports the
jinja2
examples above but there are a few things that need to be considered carefully:92d34a6: In this commit, the existing parsing of the HTTP options (
HTTP_SPECS
) is used so that only these specific arguments can be templated in. Templating currently only occurs forunix
commands whensupports_http_options
is False.Note that when templating is used, the information is no longer passed to the commands in the form of flags (it is up to the user to use the information made available in the template if and as they wish).
A small amount of logic is needed to turn the flags into valid Python identifiers for
jinja2
(removing the--
prefix and turning dashes into underscores). The HTTP option switches are supported with--anaconda-project-no-browser
and--anaconda-project-iframe-hosts
getting assigned the value ofTrue
if present (i.e. treated as a boolean flags that can only be True).d769ead: In this commit unrecognized extra arguments are also made available for templating. Right now the parsing approach is extremely simplistic: once the HTTP options are removed, the remaining arguments are expected to come in
--flag value
pairs (i.e. these becomekey:value
pairs for templating). These arbitrary flags then become the corresponding identifiers available forjinja2
templating. Unknown switches (e.g--foo
by itself) are currently not supported.Other things you can do with
jinja2
templatingSubstitution and default values have already been demonstrated above. Two other use cases I have thought about are:
Switches: You can transform the HTTP switches into other switches with something like
{% if anaconda_project_use_xheaders %} --xheaders {% endif %}
Inverted switches: In Bokeh, instead of the
--no-browser
switch, you have the inverse concept , namely--show
. You can automatically add--show
if the--no-browser
switch is not present with something like{% if anaconda_project_no_browser is not defined %} --show {% endif %}
Future ideas and other considerations
Obviously, if we take this approach,
jinja2
will become a dependency ofanaconda-project
. The PR hasn't yet done anything to reflect this sojinja2
will need to be installed if you wish to test this PR.Some other thoughts:
In future, we might want to supply the template information in alternative formats, such as a JSON dictionary.
The parsing of the unknown argument flags is currently very simple and does not try to guard against various types of user error. Only flags using a
--
prefix are supported. Having support only for the HTTP option switches is somewhat annoying.Templating is only used when
unix
commands are used andsupports_http_options
is False. I believe that templating could remove the need for thebokeh
andnotebook
commands and remove the need forsupports_http_options
(if the template doesn't use the HTTP arguments, it doesn't matter if they are always supplied or not).Outstanding tasks