StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.07k stars 749 forks source link

Ability to tell Orquesta to treat a variable as raw string and not do any (YAQL and Jinja) processing on it #4636

Open Kami opened 5 years ago

Kami commented 5 years ago

I was debugging an Orquesta flow and I initially thought it was a bug in Orquesta (due to the error message), but it turns out that the bug is related to dynamically retrieved data containing a YAQL expression.

It's fairly common scenario that data inside workflows is retrieved dynamically and this data could potentially contain Jinja or YAQL escape sequence (think Jinja / YAQL injection).

Here is an example workflow where this issue can arise - https://github.com/StackStorm/stackstorm-st2community/blob/master/actions/workflows/retrieve_data_and_send_daily_stats_to_slack.yaml

github_data and / or forum_posts can contain YAQL and Jinja expression and assemble_message task would fail publishing message variable since it would try to render it as YAQL / Jinja and this would fail due to variable not being defined - https://github.com/StackStorm/stackstorm-st2community/blob/master/actions/workflows/retrieve_data_and_send_daily_stats_to_slack.yaml#L46.

Yes, I could process / escape this data inside st2community.assemble_message (and that's what I will do until a better solution is in place), but I think we should provide an Orquesta level feature for handling that.

Pretty much all the programming languages and templating libraries provide some kind of way to treat a value as raw string / escape it and not do any processing on it and I think that's what we need in Orquesta as well (ability to mark a variable as raw so it's not processed).

I also think that's important from security perspective because there are a lot of scenarios where data is retrieved dynamically or passed in by a user.

I'm also open to suggestions (aka if this can already be solved in some other way).

m4dcoder commented 5 years ago

By default, orquesta recursively evaluate the expression. So if the expression is evaluated and returns another expression, orquesta will recursively evaluate the value until there is no more expression.

Currently, you can convert those expression to Jinja and wrap them in raw block such as {% raw %}{{ ctx().foo }}{% endraw %}. Orquesta will recognize the raw block and will stop evaluating the expression. We don't have a counterpart for YAQL expressions. We can probably come up with one similar to Jinja. Any suggestion?

Kami commented 5 years ago

Yeah, I was wondering if using "raw" Jinja tag would work for both scenarios.

So to confirm - if you use raw Jinja tag, it will only stop Jinja processing, but it will still process any YAQL expressions in that variable?

m4dcoder commented 5 years ago

I just did a quick test. The jinja raw block will preserve the YAQL expression as well.

    def test_raw_block(self):
        expr = '{% raw %}<% ctx().foo %>{% endraw %}'

        data = {'foo': 'bar'}

        self.assertEqual('<% ctx().foo %>', self.evaluator.evaluate(expr, data))
m4dcoder commented 5 years ago

We can share the solution with YAQL. And if you think we should have something specific for YAQL, I'm open as well. But it seems to be more consistent to make the raw block a solution for both types of expression.

m4dcoder commented 5 years ago

FYI, the raw blocks related unit tests for Jinja is located at https://github.com/StackStorm/orquesta/blob/master/orquesta/tests/unit/expressions/test_jinja_eval_raw_blocks.py.

Kami commented 5 years ago

Thanks - I will look at and test raw behavior tomorrow.

m4dcoder commented 5 years ago

@Kami How are you doing with this? Is the raw block sufficient to cover both types of expression? Can add more tests for YAQL and then close this issue?

Kami commented 5 years ago

@m4dcoder I tried to verify the behavior by adding a unit test (https://github.com/Kami/orquesta/commit/d134ba5c96b5910d7136e9c83eb79947dff737de) and it seems to be working fine for mixed expression scenarios, at least on the unit test level.

I will still do some more actual real world end to end testing to verify it indeed works correctly.

Kami commented 5 years ago

@m4dcoder After some more digging in - I didn't really express my problem well enough above and raw template tag is not the most ideal solution for it.

Let's imagine a workflow like this:

version: 1.0
description: test

vars
  - var1: "blah {{ jinja1 }} {{ jinja2 }} <% ctx().yaql2 %> blah"

tasks:
  task1:
    action: core.local
    input:
     cmd: "{{ var1 }}"

In this scenario I would want {{ var1 }} to be rendered, but not recursively. I would want the value to be the following: blah {{ jinja1 }} {{ jinja2 }} <% ctx().yaql2 %> blah.

Basically, the same if var1 definition looked like this: {% raw %} blah {{ jinja1 }} {{ jinja2 }} <% ctx().yaql2 %> blah{% raw %).

I used a static variable definition for example, but in real-life this would be dynamically retrieved data from another action or similar.

Right now, the only way (as far as I know) to handle that is to have a "processing" / "format" action which either strips Jinja and YAQL tags or wraps the text in {% raw %} and {% endraw %} blocks.

The problem also is that this stripping needs to happen at the very first action which retrieves this dynamic data which can contain Jinja / YAQL expressions - you can't do it later as part of another action (workflow task), because you can't pass this data around the workflow because it will try to render it and that will fail.

As mentioned above, this is far from ideal (it requires changes in the code) and needs to be done recursively for complex types such as objects.

Imagine another real-world scenario - a workflow which retrieves thread posts from our community forum using core.http (and those posts can contain Jinja or YAQL snippets).

Right now you can't use retrieved data anywhere in the workflow (you can't pass it around to another action or similar) because Orquesta would try to render it and this would fail. Only way to work around that is to have a Python action which uses requests or similar to retrieve this data and also "sanitizes" it (either removes those tags or wraps data in raw blocks). core.http is just an example - same data could also come in from another place - e.g. user input via parameter, ChatOps, etc.

My original point above security still stands (think Jinja / YAQL injection).

m4dcoder commented 5 years ago

I'm open to proposal.

bertyah commented 1 year ago

This is a huge issue for me and stackstorm in my ORG. I'm disappointed to see that this is open since 2019.

I suffer from the exact situation that @Kami describes. I sometimes get code snippets from actions that return data from APIs containing {{ }} and everything breaks.

I would be willing to work on a solution here with someone to support the project.