We currently handle exporting data to yaml, csv or console using a combination of commands and manifest config. We support:
export to yaml (by passing --output <savepath> on the cli and adding outputs: - yaml to the manifest),
export to CSV (by passing --output <savepath> with a trailing hashtag and the parameter to export on the cli, plus output: - csv in the manifest
export to the console (by passing --stdout and no config in the manifest).
Other export formats have to be handled using plugins.
Getting the combinations of config and command correct can be quite error-prone. It's not a great user experience, and it also adds some friction to our automated testing process (as there are in-manifest differences to handle).
During the IF refactor and the hackathon we experimented with other exhaust formats, such as moving everything to dedicated exhaust plugins. However, this creates additional complexity because it introduces requirements for certain plugins to be configured and executed at specific moments during the IF execution, which feels very fragile. With plugins we can't configure the exhaust easily on the command line. Exhaust plugins also muddy the waters when it comes to the manifest file being used as an executable audit of environmental impacts. The exhaust is irrelevant to the actual numbers - it is presentation only - it just adds noise to the manifest file. Finally, with exhaust being handled by plugins, every time you want to change how your data is presented you have to execute and recalculate your whole manifest file, which is inefficient.
Problem
Our current approach, and the idea to use exhaust plugins, have several drawbacks:
First, it takes away from IF’s core feature, which is the manifest file. This is more than a configuration file; it is an executable record of emissions, something auditable. By treating the manifest as a configuration file rather than the central pillar of the IF, we subtly changed the focus of IF towards an ETL tool, which confused its mission.
Secondly, it triggered the requirement to configure the exhaust “somewhere.” Since it’s a plugin rather than something baked in, we couldn’t provide specific CLI flags since we won’t know what CLI flags future exhaust plugins will require. We ended up somewhere half-baked since we have some config in the manifest file and some on the CLI for exporting CSV data with the CSV exhaust plugin.
Thirdly, we embedded the presentation into the business logic (the CSV generator was mostly there to provide a better UI to the raw manifest data), further muddying the waters of the manifest file being an executable audit of emissions and paving the way for it to become even more just a configuration file for a complex ETL pipeline that also included some UI components.
If you wanted to exhaust a manifest file, you had already computed somewhere else. In other words, you had to run your whole pipeline from scratch, all API calls, and all computation again just to run the exhaust code.
Additional comments:
we should also make sure we solve the issue of having to provide extensionless filenames to --output to avoid ending up with file.yml.yml . This was an annoying outcome of the csv export config.
--stdout should be the default for no output command
Solution
The proposed solution to these issues is to handle exhaust using scripts instead of plugins.
This cements IF as purely an engine for computing and communicating environmental impacts, and presentation related tasks are moved out of IF and into dedicated scripts. IF can only generate output files - any subsequent operations over that output data is handled externally to IF using dedicated scripts. This way, the manifest file is set up to become a protocol, and the outputs can be piped to any arbitrary downstream post-processing program.
We can first assert that IF outputs yaml data. This can be to a file or to the console. No additional code is required to configure this. Either the IF receives --output to trigger saving to a yaml file or it receives no output config, in which case it sumps the yaml to the console.
For exporting to other formats, let's take the example of csv-export.
Instead of being a plugin that operates during the IF execution, csv-export can become a separate script that operates on the IF output yaml data. We can refer to this script as if-csv.
if-csv can be invoked on the command line, taking an IF output file as an argument. Instead of having to create a fiddly command using a hashtag to define the parameter to export as we do today, the script can just expose its own commands to do the same thing more intuitively, for example:
We intend for if-csv to be bundled into the IF downlaod, but you can develop your own exhaust scripts with any arbitrary logic and use it in the same way. We'll provide some boilerplate code and a tutorial to help.
If your exhaust script requires complex configuration, it can either surface its own cli params OR, if it wants to, have its own configuration file. This keeps the IF's CLI params and manifest file clean and focused on computing manifest files.
[x] #782
Providing a savepath should be all that’s required to trigger saving to yaml ie -m <manifest> -o <savepath> . If both --stdout and --output are provided then save to file and print to console.
Background
We currently handle exporting data to yaml, csv or console using a combination of commands and manifest config. We support:
--output <savepath>
on the cli and addingoutputs: - yaml
to the manifest),--output <savepath>
with a trailing hashtag and the parameter to export on the cli, plusoutput: - csv
in the manifest--stdout
and no config in the manifest).Other export formats have to be handled using plugins.
Getting the combinations of config and command correct can be quite error-prone. It's not a great user experience, and it also adds some friction to our automated testing process (as there are in-manifest differences to handle).
During the IF refactor and the hackathon we experimented with other exhaust formats, such as moving everything to dedicated exhaust plugins. However, this creates additional complexity because it introduces requirements for certain plugins to be configured and executed at specific moments during the IF execution, which feels very fragile. With plugins we can't configure the exhaust easily on the command line. Exhaust plugins also muddy the waters when it comes to the manifest file being used as an executable audit of environmental impacts. The exhaust is irrelevant to the actual numbers - it is presentation only - it just adds noise to the manifest file. Finally, with exhaust being handled by plugins, every time you want to change how your data is presented you have to execute and recalculate your whole manifest file, which is inefficient.
Problem
Our current approach, and the idea to use exhaust plugins, have several drawbacks:
Additional comments:
Solution
The proposed solution to these issues is to handle exhaust using scripts instead of plugins.
This cements IF as purely an engine for computing and communicating environmental impacts, and presentation related tasks are moved out of IF and into dedicated scripts. IF can only generate output files - any subsequent operations over that output data is handled externally to IF using dedicated scripts. This way, the manifest file is set up to become a protocol, and the outputs can be piped to any arbitrary downstream post-processing program.
We can first assert that IF outputs
yaml
data. This can be to a file or to the console. No additional code is required to configure this. Either the IF receives--output
to trigger saving to a yaml file or it receives no output config, in which case it sumps the yaml to the console.For exporting to other formats, let's take the example of
csv-export
.Instead of being a plugin that operates during the IF execution,
csv-export
can become a separate script that operates on the IF output yaml data. We can refer to this script asif-csv
.if-csv
can be invoked on the command line, taking an IF output file as an argument. Instead of having to create a fiddly command using a hashtag to define the parameter to export as we do today, the script can just expose its own commands to do the same thing more intuitively, for example:if-csv -in-file <example.yml>
--fields carbon --output output.csv`The above example will export the
carbon
parameter from the output data inexample.yml
and save it tooutput.csv
.This can be used to run the exhaust script over an
ie
output without having to re-executeie
.However, in situations where we do want to re-execute
ie
, we cna simply pipeie
into theif-csv
script in a single command:We intend for
if-csv
to be bundled into the IF downlaod, but you can develop your own exhaust scripts with any arbitrary logic and use it in the same way. We'll provide some boilerplate code and a tutorial to help.If your exhaust script requires complex configuration, it can either surface its own cli params OR, if it wants to, have its own configuration file. This keeps the IF's CLI params and manifest file clean and focused on computing manifest files.
if-run --manifest file.yaml | some-export-script --config-file config.json -some-flag true
Related Discussion
https://github.com/Green-Software-Foundation/if/discussions/766
Tasks:
ie -m <manifest> -o <savepath>
. If both--stdout
and--output
are provided then save to file and print to console.