Green-Software-Foundation / if

Impact Framework
https://if.greensoftware.foundation/
MIT License
137 stars 38 forks source link

Impact Framework Project Updates2024-06-05 #780

Open jmcook1186 opened 1 month ago

jmcook1186 commented 1 month ago

Who

Sponsor: @jawache (GSF) Product owner: @jmcook1186 (GSF) Leads: @navveenb (Accenture), @srini1978 (Microsoft)

Overview

We had a good fortnight shipping some substantial updates to IF. We're especially happy to have if-diff merged with its advanced matching criteria fully implemented, a new csv-lookup generic builtin, major improvements to our logging that make it easier to diagnose and fix errors, and a big refactor of the SCI plugin code that makes it much easier to use.

@jawache and @jmcook1186 have also been scoping out and defining the next epic - you can join in the conversation on the discussion forum:

Updates

Here's the details for our most significant updates this fortnight:

Issues

We didn't really encounter any major issues, we were overall quite happy chugging through our latest epic!

Outlook

mgriffin-scottlogic commented 4 weeks ago

I've been looking into the csv-lookup plugin this afternoon, I can see how that would be useful. Have you considered whether you could make query parameters optional? I was thinking I could re-implement one of my carbon hack plugins using this method but I wouldn't be able to omit cloud/usage-type from some inputs. If this didn't throw an error, I could arrange the .csv entries so that the less specific entry comes first, followed by the ones where usage-type is present. Maybe an optional ignore-missing parameter on the global config?

I also initially thought that you might be able to use this plugin to create more inputs than were initially present, which doesn't seem to be the case. Has a plugin like that been considered at all? I could see it being useful to take a .csv file and create an input for each row, with headers mapped to different parameter names etc. Could almost be an extension to the mock-observations plugin but I'm not sure I'd want to have the association that it was 'mock' data.

jmcook1186 commented 3 weeks ago

Ah, so in your proposal, in the absence of any query parameters you would just parse all the columns from a csv into key value pairs with the column names as keys and arrays of column data as values? I guess it would be relatively straightforward to add this behaviour to the existing plugin.

Totally agree with your idea for a generic "csv-importer` that maps rows in a csv file to inputs. I've had that same idea on the mental backlog for a while but never found time to design it and get it on the board. Also agree it needs to be separate from "mock-observations".

These are ideal candidates for community contributions - we'd love to have them and would be keen to support/review but are unlikely to dedicate dev time to them at the moment.

@mgriffin-scottlogic

mgriffin-scottlogic commented 3 weeks ago

I suppose in the absence of any query parameters at all, it would just take the first row of the .csv? Which I think would be what would happen in the current code if you could end up with an empty list of queryData (ifMatchesCriteria.every() is true for an empty list).

My concrete example would be a .csv like:

vendor,service,type,replication
aws,s3,,3
aws,s3,RRS,2

With a manifest snippet:

...
  plugins:
    cloud-metadata:
      method: CSVLookup
      path: 'builtin'
      global-config:
        query:
          vendor: "cloud/vendor"
          service: "cloud/service"
          type: "cloud/usage-type"
        output: ['replication', 'storage/replication-factor']
        ignore-missing: true
...
  inputs:
    - cloud/vendor: aws
      cloud/service: s3
    - cloud/vendor: aws
      cloud/service: s3
      cloud/usage-type: RRS

So the first input would end up with storage/replication-factor: 3 and the second would have storage/replication-factor: 2.

If these are good candidates, I could definitely look into making a PR for this change, possibly the 'csv-importer' idea too at some point.

jmcook1186 commented 3 weeks ago

Pinging @narekhovhannisyan for opinion on the csv-lookup behaviour. I think it's generally better to error out loudly if the query parameters are specified but not found but if none are specified then returning the entire first row of the file seems reasonable.

For the csv importer I had envisaged that you would have your input data organized as a csv and the importer would parse it into yaml and add it to the manifest, with each row being an element in the inputs array, e.g.

a csv file containing this:

timestamp duration cpu-util energy
2023-07-06T00:00 1 20 5
2023-07-06T00:01 1 30 10
2023-07-06T00:02 1 40 15

becomes the following in the manifest:

      inputs:
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 20
          energy: 5
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 30
          energy: 10
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 40
          energy: 15
mgriffin-scottlogic commented 3 weeks ago

Sorry, my previous comment was still discussing the csv-lookup plugin. If it wasn't clear, I was intending to allow for optional parameters like cloud/usage-type, where there is a fallback default if only some are specified.

For a csv-importer, what you've posted is roughly what I was thinking too. It could have similar config options to the lookup plugin to allow you to take all columns, specific ones or rename them. I would also like to be able to specify constant properties that will be added to all inputs - say the .csv doesn't contain a duration but you want to specify up front that every entry represents an hour.

jawache commented 3 weeks ago

@jmcook1186 and @mgriffin-scottlogic I suspect this can just be more explicitly handled with the built-in default feature?

So in the manifest tree (perhaps even the root node) you'd add a defaults node and anything under there we automatically add to every input underneath.

I'm not sure how it works if you default to "null", maybe you default to an empty string ""?

mgriffin-scottlogic commented 3 weeks ago

Thanks Asim, I'd somehow managed to miss that feature until now, that should definitely help for both ideas.

mgriffin-scottlogic commented 3 weeks ago

Can confirm that an empty string will match against empty data and find the first entry in the list. So I can replace my idea above with:

...
  plugins:
    cloud-metadata:
      method: CSVLookup
      path: 'builtin'
      global-config:
        query:
          vendor: "cloud/vendor"
          service: "cloud/service"
          type: "cloud/usage-type"
        output: ['replication', 'storage/replication-factor']
...
  defaults:
    cloud/usage-type: ''
  inputs:
    - cloud/vendor: aws
      cloud/service: s3
    - cloud/vendor: aws
      cloud/service: s3
      cloud/usage-type: RRS
jawache commented 2 weeks ago

Whoop that's great @mgriffin-scottlogic !