Closed wolfv closed 11 months ago
I think that would be very useful, it will reduce the complexity to parse those yaml comments that are a pain. But I also think that if we could move to use toml files instead of yaml mixed with jinja2 would be way better but that is another history 😬
How would TOML files work? Wouldn't we still need some Jinja here and there?
The good thing about the pure YAML format is that we can translate it to either the next evolution of the YAML format or a TOML based format quite easily.
In fact, someone has already added TOML support (more specifically recipe embedding inside a pyproject.toml file in boa
- precisely because it is possible).
How would TOML files work? Wouldn't we still need some Jinja here and there?
The good thing about the pure YAML format is that we can translate it to either the next evolution of the YAML format or a TOML based format quite easily.
In fact, someone has already added TOML support (more specifically recipe embedding inside a pyproject.toml file in
boa
- precisely because it is possible).
regarding toml files I was thinking preprocess it and postprocess it if necessary something like poetry but on steroids, 😆 like
[requirements.host]
pytest = {version=">=5.0.0,<6.0.0", platform=["linux", "osx"], preprocess="func_to_preprocess", postprocess="func_to_postprocess"}
if it needs preprocessing it could call some predefined functions to do some magic that we might need
if we need variables to be used inside it can use
[env.variables]
my_var="abc"
[package]
name="${my_var}"
and it can inject it and expand when processing the recipe, if I am not mistaken there is a crate that does these kinda of env var expansion.
but as I mentioned, that is another whole history 😄
What you have proposed I think would improve and simplify a lot of parsing and when we need to pragmatically create the recipes. But I wish to get ride of all the jinja, because they are also a pain sometimes 😆 , however, I also understand that would be very difficult to remove them
One thing that also came to mind as I was reading the CEP was what the "migration strategy" would be for existing recipes. You mention this a little in the "short comings" section, but maybe it's worth it to have its own section.
Unfortunately, migration will be a PITA because it will most likely involve manually updating recipes (will it?), but maybe there's a tool we could write that could handle translating 90% of the of existing recipes (I'm thinking like a Python 2 to 3 migration tool).
Or, maybe migration isn't even a concern 🤷♂️. It should probably be mentioned somewhere in the document.
Thanks for the comments @travishathaway. Indeed, migration is a concern. In boa
we have a small & hacky tool that can convert simple recipes (boa convert meta.yaml > recipe.yaml
).
It works by analyzing a line, and if it finds a comment it moves the comment right after the -
and wraps it with a sel(...)
. It also parses the {% set ...
expressions and sticks them in a context dictionary.
But obviously this works only for relatively straightforward cases.
I'm curious, the title says "A new recipe format – part 1". Is there a part 2 somewhere or some written notes about a possible future part 2?
Hi @JeanChristopheMorinPerso indeed, there is the "full" proposal (of which this is a derivative) here: https://github.com/conda-incubator/ceps/pull/53
Also, all this work is based on 2+ years of work on the boa
"build tool" as well as the rattler-build build tool. Both are many times faster than the current conda-build and implement this recipe format :)
The current state of affairs is that we have 2 mini-languages + YAML (comments & Jinja) for the recipes. That makes it really hard to write a parser for it to statically extract information from the recipe or transform it in any way.
The way Jinja support is implemented in conda-build is also quite tricky: it does recursive evaluations of the recipe to find all the necessary values for the used Jinja variables. This contributes greatly to conda-build
being extraordinary slow for larger recipes (especially with multiple outputs).
I think the way Jinja is used in conda-build currently is not very deliberate. It's usage is completely unrestricted, but most recipes only use a tiny subset (string interpolation & one or two "set" statements). However, the possibilities of Jinja blur the line between: is this YAML file a script or is this YAML file a static description of the package? Right now it's both, somehow, and not good at either.
If we have decide on a somewhat more limited YAML format as input, it will be very straight-forward to generate the recipes using Python or any other language (we are actually doing this for ROS recipes in the RoboStack project where we've been using boa for 2+ years now). So that can be an "escape-hatch" for some recipes that are more annoying to write without full Jinja support. At the same time, the old recipe format will be supported indefinitely so it's also no problem to stay with what is there.
Another nice property of the new recipe format will be the effortless evolution of it. Since it's static YAML we can come up with version 2
and make a migrator that rewrites the recipe to the new spec - something that's super hard to do with the current format.
So to summarize the motivations:
Another motivating example: we have a version autotick bot in the boa-forge
collection of recipes and it requires very little code (~600 lines of Python). The part about reading & writing the recipe is trivial: https://github.com/mamba-org/boa-forge/blob/de9a92f755f21f736e380b39bba925c755b1e35f/determine_versions.py#L607-L648
It produces PRs such as: https://github.com/mamba-org/boa-forge/pull/131/files
Overall I think this is a great step. Having a more easily machine-readable recipe would open the doors for more tools to generate and parse recipes.
I do also prefer the aforementioned if
syntax over the sel(..)
syntax because:
sel(..)
as a key looks strange in yaml land. if
syntax we could later also extend that to make it more machine-readable as well!
if:
or:
- platform: win32 # not a minijinja expression
- platform: osx
This is what it could look like (snippet is actually created by @wolfv , I only changed it slightly). Note that the value for if
is also just a minijinja expression just without the {{ }}
(this is also similar to Github CI).
context:
name: xtensor
version: 0.24.6
sha256: f87259b51aabafdd1183947747edfff4cff75d55375334f2e81cee6dc68ef655
package:
name: "{{ name|lower }}"
version: "{{ version }}"
source:
fn: "{{ name }}-{{ version }}.tar.gz"
url: https://github.com/xtensor-stack/xtensor/archive/{{ version }}.tar.gz
sha256: "{{ sha256 }}"
build:
number: 0
skip:
- if: osx or win # note that the value is a minijinja expression
value: true
- if: win
# Defaults to: (because we can easily derive this from the context)
# value: true
requirements:
build:
- "{{ compiler('cxx') }}"
- cmake
- if: unix
value: make
host:
- xtl >=0.7,<0.8
run:
- xtl >=0.7,<0.8
run_constrained:
- xsimd >=8.0.3,<10
test:
commands:
- if: unix
value:
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfigVersion.cmake
- if: win
value:
- if not exist %LIBRARY_PREFIX%\include\xtensor\xarray.hpp (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfig.cmake (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfigVersion.cmake (exit 1)
I honestly don't care so much for either approach. I think the sel(...)
syntax is more terse which might be nice.
There is one feature that is hard to express with the if
syntax which is a dictionary with multiple keys, e.g.:
build:
number: 0
script:
sel(unix): |
cmake ${CMAKE_ARGS} -DBUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX=$PREFIX $SRC_DIR -DCMAKE_INSTALL_LIBDIR=lib
make install
sel(win): |
cmake -G "NMake Makefiles" -D BUILD_TESTS=OFF -D CMAKE_INSTALL_PREFIX=%LIBRARY_PREFIX% %SRC_DIR%
nmake
nmake install
However, there was also a bit of internal debate about this because we would need to define the behavior if more than one sel(...)
evaluates to true
. In that case, in the current implementations, only the first one wins.
I don't see why that would be hard. Instead of a dictionary, just use a list:
build:
number: 0
script:
- if: unix
value: |
cmake ${CMAKE_ARGS} -DBUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX=$PREFIX $SRC_DIR -DCMAKE_INSTALL_LIBDIR=lib
make install
- if: win
value: |
cmake -G "NMake Makefiles" -D BUILD_TESTS=OFF -D CMAKE_INSTALL_PREFIX=%LIBRARY_PREFIX% %SRC_DIR%
nmake
nmake install
However, I think we could do even better if the selector evaluation part is part of the model. The current implementation in Rust does two passes over the yaml. First, it evaluates the selectors which generates a new yaml file and then it parses it into a specific model format. While flattening the structure it can become unclear whether a list, a string, or an object was expected.
I think it would be better to make the selectors part of an explicit model, evaluate that model with the selectors, lowering it into another (evaluated) model. This will most likely also give the user much better error messages if two selectors accidentally result in two valid values. It also allows us to generate a JSON schema that can be used in IDE's to get autocompletion support.
The biggest fear of the if: ...
based syntax is that we now think that it's more "correct" or something like that, but then it might present a hurdle to adoption because in effect it's longer / more annoying to type :)
We had some more discussions today and came up with some changes:
"{{ jinja }}"
variables (which always coerces them to be strings in YAML) we would rather go the Github route and use ${{ jinja }}
(which plays well with YAML), or use another bracket, like ((
. However we're leaning towards the Github syntax.build:
number: ${{ 0 if win else 100 }}
and has the correct integer type etc.
We also looked at the challenge of the following example: suppose win needs an MD5 hash, and unix needs a SHA256 for a given source file (a bit contrieved, admittedly). With the if
or sel
syntax, the cleanest would be to repeat the block like:
source:
- if: win
url: https://github.com/killercup/cargo-edit/archive/refs/tags/v{{ version }}.tar.gz
sha256: 46670295e2323fc2f826750cdcfb2692fbdbea87122fe530a07c50c8dba1d3d7
- if: unix
url: https://github.com/killercup/cargo-edit/archive/refs/tags/v{{ version }}.tar.gz
md5: 0800fc577294c34e0b28ad2839435945
Although in this case it would be possible to solve with a Jinja if as well.
For lists, the syntax would look like:
- if: win
then:
- cmake
- ninja
else: # optional
- ...
The missing then
makes the syntax nicer for dictionaries, but also make it less homogenous.
Thanks Wolf! 🙏
Really appreciate you taking the time to write this up with nice examples. Think there are a lot of good improvements here. The biggest simply being in one markup language
Asked a few questions and made a few minor suggestions above. Will probably need to mull on this more and bring a few more examples into the discussion so we can think about what it would mean to move recipes into a format like the one proposed here
Another question that comes up for me frequently, which may be off-topic for this, but I'd like to at least put it out there. The multi-output format that we have today was a bolt on to an existing single output format. Though I wonder if we could do better by moving to some kind of model where the multi-output format is a first-class citizen. In any event this may be CEP in its own right, but I think this would be a valuable discussion to have and would be very beneficial for recipe maintainers. Something to think about 🙂
Hi @jakirkham thanks for your reply! Indeed, multi-outputs are one of the things I desperately want to fix and it's going to be the topic of another CEP. In that CEP, the idea will be to make the outputs completely independent (no weird inheritance of dependencies etc.) and have no-package outputs (that other outputs can depend on as a build cache).
This one here focuses on the YAML format for future recipes.
We did go back to the drawing board a little and came up with the following:
Allow the conditional if
syntax only in lists, such as dependency, script or or run exports. In all other places, users can use the "ternary" operator of MiniJinja. This will simplify the implementation quite a lot.
This is what an example recipe with proposed syntax would look like:
context:
name: xtensor
version: 0.24.6
sha256: f87259b51aabafdd1183947747edfff4cff75d55375334f2e81cee6dc68ef655
package:
name: ${{ name|lower }}
version: ${{ version }}
source:
fn: ${{ name }}-${{ version }}.tar.gz
url: https://github.com/xtensor-stack/xtensor/archive/${{ version }}.tar.gz
sha256: ${{ sha256 }}
build:
number: 0
skip:
# special case if -- raw "statements"
# note that the value is a minijinja expression
- osx or win
- win
# alternative syntax
# script: ${{ "build_cmd.sh" if unix else "build_cmd.bat" }}
script:
- if: unix
then: |
cmake ${CMAKE_ARGS} -DBUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX=$PREFIX $SRC_DIR -DCMAKE_INSTALL_LIBDIR=lib
make install
- if: win
then: |
cmake -G "NMake Makefiles" -D BUILD_TESTS=OFF -D CMAKE_INSTALL_PREFIX=%LIBRARY_PREFIX% %SRC_DIR%
nmake
nmake install
requirements:
build:
- ${{ compiler('cxx') }}
- cmake
- if: unix
then: make
host:
- xtl >=0.7,<0.8
run:
- xtl >=0.7,<0.8
run_constrained:
- xsimd >=8.0.3,<10
test:
commands:
- if: unix
then:
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfigVersion.cmake
- if: win
then:
- if not exist %LIBRARY_PREFIX%\include\xtensor\xarray.hpp (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfig.cmake (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfigVersion.cmake (exit 1)
Hi all, I'm a RAPIDS developer. I am still evaluating the impact of this proposal more generally and can post more later depending on how we see this impacting our packaging overall, but I see a few comments regarding the potential benefits of using "pure YAML" and I can add my two cents here and give a specific example as to why that particular change could be of use (also a :+1: to @wolfv's example of a generated meta.yaml from RoboStack). We've been doing something similar but focusing primarily on dependency management for now in order to single source our dependency lists between multiple Python packages and C++ libraries contained in a monorepo, each of which must product some combination of environment.yaml/meta.yaml/pyproject.toml files in addition to some other more bespoke use cases. We have support for many of the use cases we care about, but we haven't supported conda recipes due to the complexities of parsing a file that is not pure YAML in a context where we may not have all the information needed to evaluate the full Jinja context (i.e. if there are variables available during a conda build that we don't know outside that environment). The best approach we've considered so far is to simply generate many small YAML files containing dependency lists corresponding to each section and then loading them with a Jinja for loop, but that's a pretty rough solution. Having a pure YAML file would make producing the dependency lists dynamically much simpler as stated in the CEP:
However, since the new recipe format is "pure YAML" it is very easy to create and pre-process these files using a script, or even generating them with Python or any other scripting language. That means, many of the features that are currently done with Jinja could be done with a simple pre-processing step in the future.
Note that pulling dependencies from pyproject.toml into meta.yaml is generally insufficient because the two sets of dependencies generally will not match exactly due to various factors (pyproject's build-requires typically splits into conda's build + host, non-Python library dependencies are typically not available on pip, differing package names or metapackage structures between conda and PyPI, etc). You may reduce some apparent redundancy, but in practice you still have to think about both files, and you have now coupled them in a nontrivial way since logic in one file now depends on the contents of the other.
Unrelated, strong :+1: to improving multi-output support, but agreed that it's out of scope here and don't want to divert the discussion. Thanks for putting this together @wolfv!
@wolfv, can you link the previous discussions? I remember I commented some of my thoughts on an issue, but can't seem to find it.
Hmm, @isuruf this was the original "discussion document": https://hackmd.io/axI1tQdwQB2pTJKt5XdY5w
Maybe we had some more discussion in one of the conda-forge meetings or on the conda-build issue tracker?
I found some discussion here: https://github.com/conda-forge/conda-forge.github.io/issues/1612
And here are all the boa mentions in conda-forge meetings.
I've update the CEP with the new syntax as discussed. We'll be implementing the ListWithConditionals
in rattler-build
today and give it a try :)
This sounds very promising!
One thing I'm not entirely clear on is on is whether the Jinja2 functions for loading data from other files described here will continue to work. I'm guessing yes?
@sterlinm I think in most implementations they would not be supported initially. Especially load_setup_py
will definitely not be support in the Rust version (rattler-build
).
However, with the new format it is easier than ever to write a small script to create recipes from a script.
Also note that the proposal doesn't allow for something like {% set contents = load_pyproject_toml(...) %}
because that syntax doesn't parse as YAML.
We could support some form of this inside the context, but I am not sure on the specifics yet.
@wolfv Thanks, that all makes sense!
For scenarios where you want to build a conda recipe and pypi project from the same list of packages, I agree that being able to generate both from simple scripts sounds better than having lots of magic functions in the recipe at the expense of the other goals you've laid out.
Hi Marcel, big sorry for taking so long to get back on the points!
if
's would probably be forbidden. The goal was that we could make it more type-safe this way.cmp
to a cfg
function? That might be similar to how Rust / Cargo handles additional target dependent configuration like cfg(python = ">3.10")
could be the syntax that we use. In Rust you can use additional operators like cfg(all(python="3.10", numpy=">1.5"))
. Although I cannot promise that we can stick that into Jinja (minijinja)... The other way around (all(cfg(python="3.10"), cfg(numpy=">1.5"))
) might be easier to implement.I'll continue later!
OK, continuation:
The point you raise about YAML and YAML being complex: totally agree. I don't think anybody uses (and I also don't understand) anchors and multi-documents so I am happy to explicitly forbid them. However, I do think there are places where integers are required (build number). Floating point numbers might indeed not appear in a recipe. But I do think that we can (should) enforce versions to be quoted as strings. With Rust, it's quite easy to enforce with serde_yaml
, and our pydantic types also do it (https://github.com/prefix-dev/recipe-format it became even better with pydantic 2!).
As you alluded to, I am open to defer the final decision on the cmp
function (or cfg
or something else entirely). So I am considering to remove it from the spec for now in order to move past the initial spec.
One last point that could be worthwhile to discuss is wether we want to take more inspiration from wolfi-os / melange, where packages look a bit more like Github Actions: https://github.com/wolfi-dev/os/blob/main/libidn2.yaml I really like it, but at the same time the hurdle to port existing recipes will be higher and the implementation is a bit different ... but in the future we could have a pipeline.yaml that does it that way :)
I'm excited about the idea of a more parseable recipe format.
My first thought is that I'm not sure conda-build's special jinja2 functions are well-documented?
Too bad the existing format isn't versioned.
I'm more interested in parsing info/recipe/meta.yaml
inside a .conda
. The original yaml+jinja2 recipe becomes info/recipe/meta.yaml.template
.
How is our support for a separation between the recipe renderer and the rendered-recipe builder?
Based on feedback received, I have just removed mentions to the cmp
function. We should have a seperate discussion about it.
I am ready to call for a vote on this CEP! :)
Not sure what to do but I just updated the dates and "changed" the status to proposed which means that the vote is open!
I mentioned it in the internal Gitter, but our implementation of the new format is pretty complete and works nicely in rattler-build
. We are also in the process of updating its documentation to reflect the format changes we discussed here and that are now implemented.
@conda-incubator/steering
This vote falls under the "Enhancement Proposal Approval" policy of the conda governance policy, please vote and/or comment on this proposal at your earliest convenience.
It needs 60% of the Steering Council to vote yes
to pass.
To vote, please leave yes
, no
or abstain
as comments below.
If you have questions concerning the proposal, you may also leave a comment or code review.
This vote will end on 2023-10-27, End of Day, Anywhere on Earth (AoE).
yes
Yes.
yes
yes
yes
yes
my vote doesn't count, but I am 100% supportive of this and very excited to see it moving ahead!
Yes! 🚀
Great work on this. Yes.
Yes
yes
Yes
Wow, I am counting 11 yes votes (out of 15 council members) which appears to me as if the proposal is accepted!
@wolfv We should wait until the end of the voting period to tally all votes, FWIW
Yes.
.. But ideally with the main points from https://github.com/conda-incubator/ceps/pull/54#pullrequestreview-1515301216 addressed. I'll try to compose a changeset and issue a pull request against this one (if not done by tomorrow, it will have to wait for a subsequent CEP).
I guess we can now celebrate that this has been accepted?! :) :tada: I'll update and merge the CEP tomorrow with the appropriate number and status.
I guess we can now celebrate that this has been accepted?! :) 🎉 I'll update and merge the CEP tomorrow with the appropriate number and status.
Can I start to modify souschef and grayskull already? 😄
This was a standard, non-timed-out vote.
Among Steering Council members there are 12 "yes", no "no", and no abstentions. This vote has reached quorum (at least 60% of 15).
It has passed since it recorded 12 "yes" votes and 0 "no" (80% yes out of 15 eligible voters).
Does this diverge from the rattler book - sel(osx): somepkg instead of - somepkg # [osx]
, Jinja string interpolation needs to be quoted at the beginning of a string, e.g. - "{{ version }}"
Does this diverge from the rattler book
- sel(osx): somepkg instead of - somepkg # [osx]
,Jinja string interpolation needs to be quoted at the beginning of a string, e.g. - "{{ version }}"
@dholth Yes it does, that is not correct. That should have been updated in the book, can you point me to where it says that?
✏️ Rendered version here: https://github.com/wolfv/ceps/blob/cep_20.1/cep-20.1.md