blackstork-io / fabric

An open-source command-line tool for reporting workflow automation and a configuration language for reusable templates. Reporting-as-Code
https://blackstork.io/fabric/
Apache License 2.0
12 stars 0 forks source link

Switch to hcl templating and reduce relience on jq (and maybe even context map) #73

Closed Andrew-Morozko closed 1 week ago

Andrew-Morozko commented 3 months ago

As I was investigating hcl deeper for #59 and #69 I had an idea. #17 and #29 felt too clunky to implement, so perhaps this is a better way

TLDR:

Issues #20 and #59 mean we're moving towards using more of the hcl/cty ecosystem. I propose to replace (at least partially) gojq and replace text/template fully.

content text{
    query = ".document.meta.name"
    // instead of replacing the data right here, we're sending it to the plugin with the whole copy of the context
    text = "{{.query_result}}, {{.data.block.value}}"
}

using native hcl templating becomes

content text{
    // doing everything even before the `text` attribute is sent to the plugin
    text = "${get(document.meta.name)}, ${get(data.block.value)}"
}

As for advanced JQ functions, we can offer many pre-made functions from cty stdlib or write custom once.

If we decide that it's not enough, we can add gojq back in, for example like this

content text{
    text = "${jq("$args[0].meta.name | length", document)}, ${data.block.value}"
}

Proposed syntax: jq("<jq query>", <hcl path to contents of $args[0]>, <$args[1]>, ...) or we can add the args to the root array: jq("<jq query>", <hcl path to contents of .[0]>, <.[1]>, ...)

This allows us to observe the dependencies between blocks. At the moment this is just better UX and error reporting, but in principle it allows us to build a full dependency graph and execute both content and data blocks in parallel.

Details

Go-cty doesn't play well with variables. All cty.Values are constants, so if we're providing them in expression evaluation context ("Hello ${document.content[0].result}"), every change of the underlying data (update to rendered content block list) would result in recreating the whole hierarchy in cty, starting at document.

We can get away with providing data block values directly ("${data.block.value}" instead of "${get(data.block.value)}"), since all data blocks are parsed before content blocks are evaluated, and we can create the data cty.Value just once.

However with content values, there's another trick: if we provide a custom function (like get), then we can override default hcl behavior (lookup of document.content[0].result in hcl.EvalContext) via customdecode hcl extension. This allows us to manually query the document content in native go types, get the rendered result, and only then wrap it in cty.Value. This also solves an issue with local context (#17) forcing us to execute all content blocks sequentially, because any content block might access the document.content. Now, since the path is not in the opaque jq expression, but in the hcl one, we can notice that block X assesses only document.content[2], so it's ok to run it any time after document.content[2] is ready in parallel with other content blocks.

I propose that if we're going with this approach – let's enforce "${get(...)}" syntax for data block values, just to keep everything uniform.

Also: #29 is about adding predictability to the data shape of the content block for reuse in refs. But in the bigger picture, the context map[string]any that we encode each time and send to content block plugins is itself unpredictable, so the plugin itself can't rely on anything being in it, so the only use for the context is to be templated into user-supplied strings. If hcl templating replaces the text/template, then what's the use for sending the global map to each plugin? If the plugin wants some info from it – it can just define a parameter and request it. This can work even if a plugin actually wants the full data structure:

content someplugin {
    all_data_blocks = get(data)
    // but most plugins would request only some data
    some_data_block = get(data.block.name.value)
}

This way there's no implicit (and rather large) global map sent to each plugin, plugins must request and the user must approve transferring the information. Also, this helps with potential parallel execution once again: this content block relies on the whole data, so it must be executed strictly after all data blocks, but it won't be typical case: most plugins only need some data.

traut commented 3 months ago

Thank you again, Andrew, for making this proposal!

To recap the conversation we had during the sync call:

In order to allow us to use other representations and serialization of the Fabric config files in the future (JSON-serialized, stored in the DB, etc.), we must sandbox HCL-specific tooling to the parsing / de-serialization stage. This is where we can rely on HCL parser and support some of the HCL expressions (#59). Beyond parsing, having our own internal representation of a template tree would make interoperability easier (#17).

With that in mind, and being careful not to commit to one approach too early, it makes sense to separate the concerns: detaching the parsing from data querying (through data plugins), from data filtering (powered by JQ right now), from content rendering (relying fully on the content plugins to do templating).

This allows us to evolve parts of the tool without breaking the architecture:

etc

Having options would allow us to avoid big refactorings and move faster.

Andrew-Morozko commented 1 week ago

We haven't gone in this direction