Closed jawache closed 7 months ago
Bumping to sprint 7 - this should be an exhaust plugin
Bumping to sprint 7 - this should be an exhaust plugin
already WIP, in draft PR: https://github.com/Green-Software-Foundation/if/pull/441
@narekhovhannisyan @jmcook1186 I think this one could be closed as https://github.com/Green-Software-Foundation/if/pull/441 is merged. right?
Thanks for working on this @pazbardanl, unfortunately, I don't think we're quite done yet ;) I took a look and can see two issues, the first is with the plugin itself and how we've architected the exhaust functionality is a odds with the issue as it's speced out above.
It looks like we've taken the work in the pipeline csv plugin and mirrored it here in the exhaust functionality to work on the whole tree; this results in an export like the one below:
full-manifest.out.yml.csv
NOTE: The export is a little broken because the physical-processor field also has ,
in it ;) but not to worry about that because the bigger issue is that the issue requires the content to be output very differently.
This is the expected output, one row per node, time series as columns and parameters as rows.
Path | Aggregated | 2024-01-25:11-00 | 2024-01-25:11-05 | 2024-01-25:11-10 |
---|---|---|---|---|
graph.carbon | 43 | 19 | 12 | 12 |
graph.children.application.carbon | 43 | 19 | 12 | 12 |
graph.children.application.children.vm1.carbon | 12 | 3 | 4 | 5 |
graph.children.application.children.vm2.carbon | 15 | 4 | 5 | 6 |
graph.children.application.children.vm3.carbon | 16 | 12 | 3 | 1 |
This is the actual output, it's effectively transposed, the parameters are the columns and the time buckets are the rows.
id | timestamp | duration | cloud/instance-type | cloud/vendor | cloud/region | cpu/utilization | grid/carbon-intensity |
---|---|---|---|---|---|---|---|
children.application.children.uk-west.children.server-1.outputs.0 | 2024-02-26 00:00:00 | 60 | Standard_A1_v2 | azure | uk-west | 89 | 250 |
children.application.children.uk-west.children.server-1.outputs.1 | 2024-02-26 00:01:00 | 60 | Standard_A1_v2 | azure | uk-west | 59 | 250 |
children.application.children.uk-west.children.server-1.outputs.2 | 2024-02-26 00:02:00 | 60 | Standard_A1_v2 | azure | uk-west | 45 | 250 |
children.application.children.uk-west.children.server-1.outputs.3 | 2024-02-26 00:03:00 | 60 | Standard_A1_v2 | azure | uk-west | 21 | 250 |
children.application.children.uk-west.children.server-1.outputs.4 | 2024-02-26 00:04:00 | 60 | Standard_A1_v2 | azure | uk-west | 89 | 250 |
children.application.children.uk-west.children.server-1.outputs.5 | 2024-02-26 00:05:00 | 60 | Standard_A1_v2 | azure | uk-west | 92 | 250 |
children.application.children.uk-west.children.server-1.outputs.6 | 2024-02-26 00:06:00 | 60 | Standard_A1_v2 | azure | uk-west | 91 | 250 |
There is also no place for the aggregated values both horizontal or vertical.
See the tree in the issue above as an example, the pseudo-code for how this csv exporter should work is like so:
.carbon
and pre-pend tree
, that's the first cell, e.g. tree.children.application.children.uk-west.children.server-1.carbon
tree
; the root tree node can have outputs that are aggregated up to the root, see the example in the issue above.aggregated
field, read from that carbon
, and that's the second cell with the title of aggregated, if there is no aggregated field just leave blank.outputs
, then grab the carbon
from each of the observations in the outputs and those are your values for the time-buckets.
I'm going to create another ticket and reference it here, but for CSV, we need to be able to pass it the fields we want to export as well as the filename (on the command line), so a few changes are needed there, cc @narekhovhannisyan
@jawache I'm still reading through this is to make sure I capture everything. My only concern is that while the csv structure is well-organized and comprehensive (captures both raw and aggregated outputs) it might not be useful for visualization as it is. For example: if I open this table in excel and try to create a quick line chart from it i might find it difficult since there is no column that has the timestamp and can act as the one for horizontal axis values. Another example is Grafana: trying to use this csv file as a data source for Grafana will also be tricky since there is no column for timestamp.
A simple fix would be to transpose the table, having the paths as column names (make sense as they represent calculated values). Only issue with that is that the "aggregated" row will be perceived as a timestamp which is not the case, and might be visualized as a weird unexplainable spike at the start / end of the chart.
So bottom line: this is what i propose, although it makes our lives harder: Maybe we can have the CSV exporter support both modes:
Aggregation
(couldn't find a better name for it) - generate a table with aggregated values, identical to the one you put in your comment.Visualization
- generate a table that's transposed (thus have a timestamp column) and does not have an 'aggregated' column. This one would also separate different children into different tables so that each can have its own chart.@pazbardanl interesting, I hadn't considered it from the graphana angle.
For me this csv format is mostly for manual human consumption, important to both understand and see aggregated numbers but also as a tool to help rationalize what the manifest is computing. For trivial cases it's doable in the Yaml directly but for even medium use cases with several components it's important to have a way to look at the numbers and ask if it just makes sense or you screwed up the file somewhere, or just quickly see how the numbers aggregate and breakdown.
If however we need another format for graphana visualization that makes sense also.
Maybe rather than overload one function we just have two built in exporters.
csv (my version) csv-raw (your version)
@jawache Ok so I think I am finally on the same page with you about the 'human' use case (sorry it took so long..) Agree - we should probably have 2 of those: csv - for validation by humans. csv-raw - for visualization by tools such as Excel and Grafana.
i think the last detail I'm missing is priority - I'm guessing the csv one is more urgent for the hackathon, right? we got a simple HTML exporter for easy visualization so maybe csv-raw can wait? cc @jmcook1186
Story
As a user, I want to export the entire graph as a CSV to analyze the data in other applications.
Rationale
Trying to navigate and understand the impacts of your software application by looking at YAML is very challenging. By exporting into a flat table structure of a CSV, a human can better understand how the impacts are broken down by component and time.
Implementation details
This is an extension to IF that outputs the tree in a CSV format.
ie --manifest --exhaust csv --filter comma,seperated,list,of,parameters,to,export
The manifest is in the format of a YAML that has been processed by impact engine like so
The output CSV file should be of this format:
How to choose the path?
The path column should be a javascript-like path, which we can use to easily identify the node in the graph this parameter relates to.
NOTE: it might be redundant to have
children
so many times in the key; we may consider stripping that out for brevity and ease of reading.To aggregate or not?
If aggregated data is present, it should be added to the first column called aggregated.
What if the data is not time-synchronized?
We have a problem! If aggregated data is present then maybe just print out that column, but probably just error out since the columns won't make sense.
Priority
4/5 this tool would be very useful to debug and test other features in IF including aggregating.
Scope
This is an external tool to IF, so other than some docs, it won't affect other things too much.
Size
Several days perhaps including testing.
What does "done" look like?
Does this require updates to documentation or other materials??
It will need documentation.
What testing is required?
Yes, a variety of different graph types.
Is this a known/expected update?
Related to this https://github.com/Green-Software-Foundation/if/issues/298