blackstork-io / fabric

An open-source command-line tool for cybersecurity reporting automation and a configuration language for reusable templates. Reporting-as-Code
https://blackstork.io/fabric/
Apache License 2.0
12 stars 0 forks source link

Vars block and row_vars #190

Closed Andrew-Morozko closed 2 weeks ago

Andrew-Morozko commented 1 month ago

Resolves #166 #191 #169


Refactoring to support row_vars

Decided to do it before merging in vars, because some changes to the vars evaluation are needed

Now there is a definitions.DataCtxEvalNeeded interface. All values that need dataCtx to be evaluated (such as jq_query and rows_var) implement it. This deferred evaluation occurs when the cty.Value is converted into another format:

In order to prevent duplication of the tricky evaluation/encapsulation logic there now exists pkg/ctyencoder: a generic transformer from cty to user-chosen type. For us, it's plugin.Data and pluginapiv1.CtyValue (the grpc type).

pkg/ctyencoder provides better error reporting: errors in nested cty objects now come with paths (sequence of indexing operations). To implement this pkg/diagnostics has been refactored to allow easily adding Extra values to diagnostics. diagnostics.PrintDiags uses this extra information to improve diagnostics before they are printed. Circular ref detection diagnostics no longer work by writing the traceback as the function returns, now they modify the corresponding Extra, and the printer is the one responsible for outputting the traceback.

Support for passing around plugin.Data inside of cty type system and even outside of it (via grpc or with plugin.Data encoding) Simplified cty and data protobuf types, now there are fewer nested one-field structs, fewer messages overall, and less code.

TODO:

Andrew-Morozko commented 1 month ago

One question regarding refs:

document "hello" {
  content text "base" {
    vars {
      a = "original"
      b = query_jq(".vars.a")
    }
    value = "{{toPrettyJson .vars}}"
  }
  content ref "ref" {
    base = document.hello.content.text.base
    vars {
      a = "redefined"
    }
  }
}

I assume that here the base block outputs "a": "original", "b": "original" and the ref block outputs "a": "redefined", "b": "original" (and not "a": "redefined", "b": "redefined"). Otherwise reasoning about values of vars becomes too difficult and non-local.

In other words, shadowing happens after evaluation, not before. Am I correct? @traut

Another complicated situation:

document "hello" {
  vars {
    a = "original"
  }
  content text "base" {
    vars {
      b = query_jq(".vars.a")
      a = "redefined"
    }
    value = "{{toPrettyJson .vars}}"
  }
}

This should output "a": "redefined", "b": "original", I guess? So, the redefinitions happen in order of variable declaration.

If so then

document "hello" {
  content text "base" {
    vars {
      a = {
        b = "val"
        c = query_jq(".vars.a.b")
      }
    }
    value = "{{toPrettyJson .vars}}"
  }
}

must output {"a": {"b": "val", "c": null}}, since the a is not set when query_jq is evaluated

traut commented 4 weeks ago

@Andrew-Morozko made the issue for env vars -- https://github.com/blackstork-io/fabric/issues/191

traut commented 4 weeks ago

I assume that here the base block outputs "a": "original", "b": "original" and the ref block outputs "a": "redefined", "b": "original" (and not "a": "redefined", "b": "redefined"). Otherwise reasoning about values of vars becomes too difficult and non-local. In other words, shadowing happens after evaluation, not before.

wouldn't it be easier to shadow before evaluation, though? We collapse the var definitions first, and evaluate after. In your example, I would kind of expect "a": "redefined", "b": "redefined". It would also follow the theme that the referenced block is not evaluated but the referencing one is.

I might be off here, but I can't think of when this breaks the expected output. What do you think?

traut commented 4 weeks ago

damn, those edge cases 😅

This should output "a": "redefined", "b": "original", I guess? So, the redefinitions happen in order of variable declaration.

I think you're right.

must output {"a": {"b": "val", "c": null}}, since the a is not set when query_jq is evaluated

that's totally reasonable!

Andrew-Morozko commented 4 weeks ago

wouldn't it be easier to shadow before evaluation, though?

It is kinda easier, but in the end it's just a question of what is more expected by the user (and easier to describe in the docs 😉). I also think that "a": "redefined", "b": "redefined" better matches the established ref block mechanics, but I also imagined debugging this example:

document "hello" {
  content text "base" {
    vars {
      a = ["original"]
      b = query_jq(".vars.a[0]")
    }
    value = "{{toPrettyJson .vars}}"
  }
  content ref "ref" {
    base = document.hello.content.text.base
    vars {
      a = {
        b = "redefined"
      }
    }
  }
}

Which would produce this error:

Error: Failed to run the query

  on zzz.fabric line 5, in document "hello":
   5:       b = query_jq(".vars.a[0]")

expected an array but got: object ({"b":"redefined"})

And the user would look at lines 4 and 5 and not understand where that object came from.

But perhaps the solution is including a note "this error was triggerd through ref at ...", not special-casing var handling for refs.

traut commented 4 weeks ago

@Andrew-Morozko let's go with "a": "redefined", "b": "redefined", so collapse/shadow the vars and evaluate after.

You are right, making it clear is very important. I think it's straightforward to explain this way.

But perhaps the solution is including a note "this error was triggerd through ref at ...", not special-casing var handling for refs.

Yeah, if the error is in the ref block, we can use different wording or provide more context!

Andrew-Morozko commented 4 weeks ago

One more tricky situation:

Right now ref vars are executed after the base vars, except in cases where var in ref shadows the one in base. In this case, it is executed at the same time as var in base would've been executed, if not for override.

This leads to yet another instance of quirky behavior:

document "hello" {
  content text "base" {
    vars {
      a = "original"
      b = "unique to base"
    }
    value = "{{toPrettyJson .vars}}"
  }
  content ref "ref-if-after (this is the current behavior)" {
    base = document.hello.content.text.base
    vars {
      c = "unique to ref"
      q_b = query_jq(".vars.b") // works as expected
      a = query_jq(".vars.c") // doesn't have access to c, because "a" overrides the variable in the "base", and therefore is executed before c is set
    }
  }

  // If we evaluate vars of the ref before vars of the base another issue occurs:
  content ref "ref-if-before" {
    base = document.hello.content.text.base
    vars {
      c = "unique to ref"
      q_b = query_jq(".vars.b") // doesn't have access to b, vars in ref are evaluated before vars in base
      a = query_jq(".vars.c") // works as expected
    }
  }
}

If we're staying with the "a": "redefined", "b": "redefined" (base vars are overridden before evaluation), then this would seem to be an intractable problem. Current behavior seems to be the optimal solution in that case, it just needs to be documented.

traut commented 4 weeks ago

Right now ref vars are executed after the base vars, except in cases where var in ref shadows the one in base.

Why split it into two execution steps? To avoid re-executing vars in the base block if there are multiple ref blocks?

content ref "ref-if-after (this is the current behavior)" {
  base = document.hello.content.text.base
  vars {
    c = "unique to ref"
    q_b = query_jq(".vars.b") // works as expected
    a = query_jq(".vars.c") // doesn't have access to c, because "a" overrides the variable in the "base", and therefore is executed before c is set
  }
}

That is caused by the split into two steps. Merging the vars and executing after would resolve this issue: a would de redefined and would return unique to ref, if I understand correctly

// If we evaluate vars of the ref before vars of the base another issue occurs:

Yeah, same problem. It feels like the vars need to be merged before any evaluation happens. Otherwise, we have this implicit order of evaluation between blocks that is a bit confusing.

Andrew-Morozko commented 4 weeks ago

Why split it into two execution steps?

No, It's all a single execution step, I'm talking about the variable evaluation order (the order in which the variable values are set in the context map). Gojq works on plugin.Data, but variables are returned as cty.Value after evaluating the corresponding fcl expression. So we need to go over the list of cty.Values in some order, turn them into plugin.Data and set the corresponding key on (data context).vars.

In order to get "a": "redefined", "b": "redefined" in this example, variables should be evaluated in this order (shown in comments)

document "hello" {
  content text "base" {
    vars {
      a = "original" // not evaluated for ref
      b = query_jq(".vars.a") // 2
    }
    value = "{{toPrettyJson .vars}}"
  }
  content ref "ref" {
    base = document.hello.content.text.base
    vars {
      a = "redefined" // 1
    }
  }
}

Applying this evaluation order to my latest example:

document "hello" {
  content text "base" {
    vars {
      a = "original" // never evaluated for ref
      b = "unique to base" // 2
    }
    value = "{{toPrettyJson .vars}}"
  }
  content ref "ref-if-after (this is the current behavior)" {
    base = document.hello.content.text.base
    vars {
      c = "unique to ref" // 3
      q_b = query_jq(".vars.b") // 4
      a = query_jq(".vars.c") // 1, because it replaces the "a" from base
    }
  }
}

We can, of course, not change the evaluation order:

document "hello" {
  content text "base" {
    vars {
      a = "original" // never evaluated for ref
      b = "unique to base" // 1
    }
    value = "{{toPrettyJson .vars}}"
  }
  content ref "ref-if-after (this is the current behavior)" {
    base = document.hello.content.text.base
    vars {
      c = "unique to ref" // 2
      q_b = query_jq(".vars.b") // 3
      a = query_jq(".vars.c") // 4
    }
  }
}

but then this breaks the first example:

document "hello" {
  content text "base" {
    vars {
      a = "original" // not evaluated for ref
      b = query_jq(".vars.a") // 1, and would fail, a is not defined yet
    }
    value = "{{toPrettyJson .vars}}"
  }
  content ref "ref" {
    base = document.hello.content.text.base
    vars {
      a = "redefined" // 2
    }
  }
}

Same analysis for the second case (ref-if-before):

document "hello" {
  content text "base" {
    vars {
      a = "original" // never evaluated for ref
      b = "unique to base" // 4
    }
    value = "{{toPrettyJson .vars}}"
  }

  content ref "ref-if-before" {
    base = document.hello.content.text.base
    vars {
      c = "unique to ref" // 1
      q_b = query_jq(".vars.b") // 2, and fails, b is not defined yet
      a = query_jq(".vars.c") // 3
    }
  }
}
traut commented 4 weeks ago

aha, understood, thank you for breaking it down!

I think the order in the first snippet makes the most sense, at least for now. Is that the one you are leaning toward, too?

Andrew-Morozko commented 4 weeks ago

Is that the one you are leaning toward, too?

Yep, that's the most logical one

traut commented 3 weeks ago

@Andrew-Morozko if rows_var is implemented by Fabric, outside the table content provider, it makes sense to rename it into a more generic collection_var or similar, so that we can reuse it for other providers

Andrew-Morozko commented 2 weeks ago

Finally implemented row vars. Here's the implementation, including changes discussed on slack.

document "test" {
    vars {
        a = 1
        b = query_jq(".vars.a + 1")
        xxx = "xxx"
    }
    content table {
        rows_var = query_jq("[10, 20, 10*(.vars.b + 1)]")

        columns = [
            { header = "{{ .block.col_index }} Value", value = "{{ .block.row }}"},
            { header = "{{ .block.col_index }} Index", value = "{{ .block.row_index }}"},
            { header = "{{ .block.col_index }} ValueFromContext {{ .vars.xxx }}", value = "{{ .vars.xxx }}"},
            { header = "{{ .block.col_index }} StaticValue", value = "foo"},
        ]
    }
}

Result:

|1 Value|2 Index|3 ValueFromContext xxx|4 StaticValue|
|---|---|---|---|
|10|1|xxx|foo|
|20|2|xxx|foo|
|30|3|xxx|foo|