dagwieers commented 6 years ago

Proposal: jinja-data-type

Author: Dag Wieers <@dagwieers> IRC: dag

Date: 2018-03-23

Status: New
Proposal type: core design
Targeted Release: 2.7
Associated PR: ansible/ansible#49801
Estimated time to implement: 1 day

Motivation

Make it easier to use Jinja2 in YAML by avoiding quotes and curly brackets.

Problems

What problems exist that this proposal will solve?

YAML and Jinja2 are sometimes stepping on each other's toes
- YAML parses values starting with curly brackets as a dictionary, so quotes are mandatory
- Mandatory quotes introduce complexity, especially if the payload requires both single and double quotes
- Avoid the confusion where people think a Jinja expression automatically turns into a string because it is quoted
- The existing _when and until directives already introduce some ambiguity, expecting a Jinja expression. This ambiguity we could fix by endorsing the use of the !jinja tag.

Solution proposal

The introduction of a user-defined (local) data type in YAML using tag !jinja
- Using !jinja is the shortest descriptive name, using !jinja2 or !j2 is not an option
- This tag will flag a value as a Jinja expression
- As such it does not require quotes (avoids triple quoting), or Jinja's standard {{ and }} delimiters (moustache overload)
The implementation would turn such data into a Jinja delimited string (lazy evaluation)
This proposal improves how users interact with Ansible:
- It simplifies most Jinja2 usage in playbooks, inventories and vars-files (assignments and data manipulation)
- It uses YAML the way it was designed to be without compromising on Jinja's functionality
- A standard like this could become a wider standard endorsed by the Jinja project and others
- It allows alternative template engines to be used concurrently
- It enables syntax highlighters to highlight expressions from template engines

Current syntax

    host: '{{ inventory_hostname }}'
    cookie: "{{ result | regex_replace('^.*cookie=\"([^\"]+)\".*$', '\\1') }}"

New syntax (GitHub YAML syntax highlighter doesn't properly highlight tags)

    host: !jinja inventory_hostname
    cookie: !jinja result | regex_replace('^.*cookie="([^"]+)".*$', '\\1')

Dependencies (optional)

For clarity it helps if YAML syntax highlighters would indicate data-type tags clearer
- Github YAML highlighter and Sphinx pygments fail here clearly
- VIM, VSCode, Gedit all do this well
This proposal has no intention to deprecate any existing functionality related to Jinja2
This could be introduced together with Jinja's native environment in Ansible v2.7

Testing (optional)

Undoubtedly.

Documentation (optional)

No, documentation is not optional.

Anything else?

Let the bike-shedding begin.

webknjaz commented 6 years ago

I love this proposal! :tada:

Though I'd call it smth like !j2_expr to clearly indicate that it's an expression, not template string.

Just !jinja (or !jinja2, or !j2) would confuse me, since I'd think it expects a template. I think, it might be a good idea to introduce it, but make accept template string. This would improve readability as well.

Concerns:

Introducing multiple ways of doing the same might be harmful.

perlpunk commented 6 years ago

I recently suggested something like that, but IMHO !expr might be better, since it's independent from the templating language used. Like @webknjaz said, !jinja suggests a template.

jpmens commented 6 years ago

I like how this proposal makes variable definitions / use look clear and uncluttered, and I like the idea of naming it !jinja (without a 2 and without underscores) because the name doesn't distract. While saying !expr is at least equally clear, the word "expr" may be a too general term.

perlpunk commented 6 years ago

I could even imagine just !e (or !j). That would make it significantly faster to type compared to "{{ }}"

Additionally I should mention @ingydotnet's PR to allow {{ ... }} to be unquoted: https://github.com/ansible/ansible/pull/36331

It's a kind of a hack, because {{ foo }} will be processed equal to this structure { { foo: null }: null } and then the constructor takes care that it will be loaded as the string "{{ foo }}". And foo now is a plain YAML scalar and is subject its syntax rules now. But still an interesting suggestion that shows what you can do with PyYAML.

tonk commented 6 years ago

I think this is a very sound idea, altough I doubt the !jinja keyword a bit. It makes a non Pythionista a bit weary about Ansible . Of course you have to know Jinja2, but something like !expr with a !e shortcut seems better as far as I'm concerned.

ingydotnet commented 6 years ago

Here are a couple thoughts:

From a YAML perspective, tags are meant to represent types, not transformations. This is:

- !age 42  # valid use of a tag to load an integer as an age object
- !upper foo  # invalid use of a tag to transform a value to an upper case string

One way to tell the difference... a value created from a loaded YAML tagged node, would be serialized (dumped) using the same tag. ie A foo object in memory is serialized using !foo, and !foo deserializes back into a foo object. A transform on the other hand, turns one value into another. For instance !join [a, b, c] might load as the string "abc", but the reverse would never be true.

YAML (as of 1.2) is a data serialization only language. There was never any intention to provide transforms. In 1.3 we are discussing adding annotations (similar to Python decorators) that are solely intended as transformations. They would look like @join('-') [a, b, c], which would load as "a-b-c"; or possibly @jinja foo | bar.

That said, 1.3 does not exist yet, and so it might be compelling to abuse tags for all sorts of transforms.

There is another solution besides explicit tags. Every unquoted scalar can be made to be implicitly typed. Ansible could be made to run any scalar that began with some certain char, to go through a transformation. For instance, every string that begins with \ (or any other valid character that would never appear in a normal value) could be considered Jinja. ie this \ foo | bar could be the same as "{{ foo | bar }}".

Here is some actual python code to make that work:

import re, yaml, json

yaml_text = r'''
key1: foo
key2: \bar  # instead of "{{bar}}"
'''

def jinja_eval(loader, node):
    # instead of jinja eval, just return upper for this demo:
    return node.value[1:].upper()

yaml.add_implicit_resolver(u'!jinja', re.compile(r'^\\'))
yaml.add_constructor('!jinja', jinja_eval)

print json.dumps(yaml.load(yaml_text), indent=2)

Which outputs this:

{
  "key2": "BAR",
  "key1": "foo"
}

You might even want to use something like << foo | bar >>. You would then just replace the <<>> with {{}} (before jinja eval). This form uses no quotes. As was already pointed out, quotes indicate the value will be a string, but the jinja evaluation might return a non string.

Food for thought.

perlpunk commented 6 years ago

@ingydotnet I disagree with "invalid use of a tag to transform". IMHO the distinction if something is a transform or simply a data type is not that easy, and "invalid" is a bit too strict.

It depends on the implementation if the tag survives a roundtrip. To implement it strictly, one could load !e scalars as JinjaExpression objects. Only when the expression is applied, it would retrieve the string saved in this object. That way it would conform to what a tag is supposed to be.

If you look at it that way, there are already implicit !e tags that are derived from the schema. For example, when and var in the following example already expect an expression and not a template. They will warn if you include {{ }} around it:

- debug:
    var: output.stderr   # implicit expression
  when: output.rc != 0   # implicit expression

If the implementation does a transformation in the custom constructor, then IMHO this would still be ok. In case of a roundtrip (load and dump) the following playbook would end up like this:

# before
- set_fact:
    path: !e some_register.stdout

# after roundtrip
- set_fact:
    path: "{{ some_register.stdout }}"

Since we do not need roundtrips in ansible, and since the goal of this proposal is to make editing easier, I think this is not a reason to ~~reject the proposal~~ not implement this.

perlpunk commented 6 years ago

@ingydotnet Personally I'd prefer alternate delimiters too. The question is if it will break existing files. I'm sure there are cases where plain scalars start with a backslash. I can't think of a use case where you'd have a scalar that starts with backslash+space, but probably somewhere there exists such a use case. Same question with << >>.

dagwieers commented 6 years ago

Ok, so let me chime in here.

@webknjaz @perlpunk Using !e or !expr, while shorter, is not helpful because it does not allow for other template engines or syntax highlighting. We really want to build something that can be reused by other projects and not do something that only makes sense in Ansible's context.

That rules out using alternate delimiters as well. And that's why a specific tag was proposed.

The remark that it doesn't explicitly state what follows is an expression was never a problem for _when or until directives, you simply have to know and the documentation has plenty examples (and even warns when improperly used). This would behave identically with !jinja. Anything more complicated will be hard to get people to use it.

@ingydotnet I do not agree with the statement that this is an improper use of tags. We are not using !jinja for doing transformations. The tag is used to indicate that what follows is not just a string, it's a Jinja expression. (Pretty much the same as saying: it's not an integer, it's someone's birth-year), there's no transformation performed, and matches the YAML spec:

The tag property identifies the type of the native data structure presented by the node. A tag is denoted by the “!” indicator.

In that sense, Ansible uses the Jinja expression, like another program would use birth-year to calculate back the person's age.

ingydotnet commented 6 years ago

@dagwieers @perlpunk I wasn't rejecting the proposal. I think an explicit tag of !jinja is probably a decent way to accomplish the goals.

I get what you are saying that the tag turns the string (conceptually) into a Jinja object, but jinja is a transformation construct here. I guess it depends on when the transformation is applied. If the Jinja object was preserved as such in memory, then sure, it could be roundtripped. Since Ansible probably doesn't want to reserialize the loaded object, the concern is moot.

Tags can also be (ab)used to do things like !include, !merge, !interpolate. I hope you would agree that (without bending the obvious intention) that these are transforms.

We are hoping to get support for transforms via "annotations" into YAML 1.3. So in that case, using tags for that purpose might be actually useful in the interim. This is what I was trying to get across.

re implicit tagging, unless there are plans to support alternate things like Jinja in the future (!mustache maybe), having some unique pattern imply the tag !jinja (which would also be allowed explicitly) might be nicer for the users.

perlpunk commented 6 years ago

@ingydotnet I got what you were saying. Claryfying: Since ansible just loads the whole data structure and applies templating and expressions later, there are two basic ways of implementing it.

If you load it as a JinjaExpression object, then it will be able to survive a roundtrip, and as such, looks like a "valid" usage of a tag. But you have more work to implement this in the code where you apply the expression.

If you transform it to the string "{{ foo }}" during constructing, the information will get lost. That's not what tags were invented for, I agree. The advantage is that implementation would be really easy.

Ansible is not used for serializing, and even if you have a use case where you want to load and dump an ansible playbook, it doesn't hurt that much to lose that information, since it will be turned into something which is functionally equal. (You would have to use the Ansible PyYAML loader, of course).

If the goal is a more general solution, it might be worth to use the first solution, though. Since Ansible already makes a distinction between strings that follow keys like var or when, there is already a kind of schema in use, just not at the constructor level.

So my point was that I wouldn't go as far as calling such usage "invalid" or "abusing tags".

dagwieers commented 6 years ago

Since Ansible probably doesn't want to reserialize the loaded object, the concern is moot.

@ingydotnet That was exactly my point ! But I appreciate your clarification.

bcoca commented 6 years ago

i would make this function more like vault and 'tempalte on demand' when looking at a value of the object created by YAML.

ingydotnet commented 6 years ago

@bcoca could you explain a bit more, or link to some examples? I'm not familiar with those things...

bcoca commented 6 years ago

examples would look the same, basically anything !vault gets put into a 'ansible vault object' (AnsibleVaultEncryptedUnicode) which is then decrypted 'on access' but not on YAML parsing.

webknjaz commented 6 years ago

@dagwieers I'd insist on !j2_expr or !jinja_expr then. To me, jinja is a template, not an expression inside of it's control flow syntax.

The point about documentation doesn't sound strong to me: yes, docs must exist, but I'm talking about the public interface being intuitive when read even w/o docs.

Abstract example/thoughts: Imagine you can make your app rely on some weird useless env var FOO=bar and document that it won't run w/o it, but you still give user a proper error message, so that it's obvious what's wrong w/o going to docs. Also, you don't normally make your app to rely on unobvious things: you think about design first. Public API (naming of this tag in our case) is important here: you want it to be clearly understood by users regardless of whether they read docs.

Back to naming: everyone who've heard about jinja most likely knows that it's about templating. I'd rather not break this impression in their brains by misusing this name. That's why I want to be clear by communicating it to them via expr somewhere in tag name: dude, don't put a template here, it's explicitly an expression (while still processed by jinja).

dagwieers commented 6 years ago

@webknjaz I don't see your point. Same is true for _when and until directives. You need to know it requires a Jinja expression to use it. Examples make this clear, and if you do it wrong, Ansible will tell you. Naming is always a balance between descriptiveness and convenience.

Naming this !jinja_expr is a good way to kill this from a usability perspective. Just like https://github.com/ansible/proposals/issues/74 was effectively killed by bikeshed. (this was too much like a programming language, result was already used too much, everything has to be prefixed with ansible_ so you end up with the proposal to use ansible_task_result)

webknjaz commented 6 years ago

Oh.. You speak about user writing that expression, but I take into account also one reading it. From the writer's perspective it's less useful than from point of view of the person who tries to understand what's going on in the playbook which already exists.

Also, YAML tags are also meant to represent some concepts from the tool it configures. They say it's a human-readable format, which is important here. Otherwise we'd end up writing configs in some kind of weird JSON, right?

I cannot agree that it's acceptable to rename expressions into templates, it's not: you confuse the reader this way, because it's unnatural to call things with words, which mean completely different.

I can see why you dislike !jinja_expr being long, but I don't insist on such prefix. I just want it to reflect it's purpose. That's all I want.

bcoca commented 5 years ago

https://github.com/ansible/ansible/pull/49801

bcoca commented 2 years ago

This was implemented in PR above, but in the end rejected by core team, as such, closing this discussion.

ansible / proposals

Introduce !jinja data type in YAML #101

Proposal: jinja-data-type

Motivation

Problems

Solution proposal

Dependencies (optional)

Testing (optional)

Documentation (optional)

Anything else?

Concerns: