Open jmcook1186 opened 1 month ago
I've been looking into the csv-lookup
plugin this afternoon, I can see how that would be useful. Have you considered whether you could make query parameters optional? I was thinking I could re-implement one of my carbon hack plugins using this method but I wouldn't be able to omit cloud/usage-type
from some inputs. If this didn't throw an error, I could arrange the .csv entries so that the less specific entry comes first, followed by the ones where usage-type
is present. Maybe an optional ignore-missing
parameter on the global config?
I also initially thought that you might be able to use this plugin to create more inputs than were initially present, which doesn't seem to be the case. Has a plugin like that been considered at all? I could see it being useful to take a .csv file and create an input for each row, with headers mapped to different parameter names etc. Could almost be an extension to the mock-observations plugin but I'm not sure I'd want to have the association that it was 'mock' data.
Ah, so in your proposal, in the absence of any query parameters you would just parse all the columns from a csv into key value pairs with the column names as keys and arrays of column data as values? I guess it would be relatively straightforward to add this behaviour to the existing plugin.
Totally agree with your idea for a generic "csv-importer` that maps rows in a csv file to inputs. I've had that same idea on the mental backlog for a while but never found time to design it and get it on the board. Also agree it needs to be separate from "mock-observations".
These are ideal candidates for community contributions - we'd love to have them and would be keen to support/review but are unlikely to dedicate dev time to them at the moment.
@mgriffin-scottlogic
I suppose in the absence of any query parameters at all, it would just take the first row of the .csv? Which I think would be what would happen in the current code if you could end up with an empty list of queryData
(ifMatchesCriteria.every()
is true for an empty list).
My concrete example would be a .csv like:
vendor,service,type,replication
aws,s3,,3
aws,s3,RRS,2
With a manifest snippet:
...
plugins:
cloud-metadata:
method: CSVLookup
path: 'builtin'
global-config:
query:
vendor: "cloud/vendor"
service: "cloud/service"
type: "cloud/usage-type"
output: ['replication', 'storage/replication-factor']
ignore-missing: true
...
inputs:
- cloud/vendor: aws
cloud/service: s3
- cloud/vendor: aws
cloud/service: s3
cloud/usage-type: RRS
So the first input would end up with storage/replication-factor: 3
and the second would have storage/replication-factor: 2
.
If these are good candidates, I could definitely look into making a PR for this change, possibly the 'csv-importer' idea too at some point.
Pinging @narekhovhannisyan for opinion on the csv-lookup behaviour. I think it's generally better to error out loudly if the query parameters are specified but not found but if none are specified then returning the entire first row of the file seems reasonable.
For the csv importer I had envisaged that you would have your input data organized as a csv and the importer would parse it into yaml and add it to the manifest, with each row being an element in the inputs
array, e.g.
a csv file containing this:
timestamp | duration | cpu-util | energy |
---|---|---|---|
2023-07-06T00:00 | 1 | 20 | 5 |
2023-07-06T00:01 | 1 | 30 | 10 |
2023-07-06T00:02 | 1 | 40 | 15 |
becomes the following in the manifest:
inputs:
- timestamp: 2023-07-06T00:00
duration: 1
cpu-util: 20
energy: 5
- timestamp: 2023-07-06T00:00
duration: 1
cpu-util: 30
energy: 10
- timestamp: 2023-07-06T00:00
duration: 1
cpu-util: 40
energy: 15
Sorry, my previous comment was still discussing the csv-lookup
plugin. If it wasn't clear, I was intending to allow for optional parameters like cloud/usage-type
, where there is a fallback default if only some are specified.
For a csv-importer, what you've posted is roughly what I was thinking too. It could have similar config options to the lookup plugin to allow you to take all columns, specific ones or rename them. I would also like to be able to specify constant properties that will be added to all inputs - say the .csv doesn't contain a duration but you want to specify up front that every entry represents an hour.
@jmcook1186 and @mgriffin-scottlogic I suspect this can just be more explicitly handled with the built-in default feature?
So in the manifest tree (perhaps even the root node) you'd add a defaults node and anything under there we automatically add to every input underneath.
I'm not sure how it works if you default to "null", maybe you default to an empty string ""?
Thanks Asim, I'd somehow managed to miss that feature until now, that should definitely help for both ideas.
Can confirm that an empty string will match against empty data and find the first entry in the list. So I can replace my idea above with:
...
plugins:
cloud-metadata:
method: CSVLookup
path: 'builtin'
global-config:
query:
vendor: "cloud/vendor"
service: "cloud/service"
type: "cloud/usage-type"
output: ['replication', 'storage/replication-factor']
...
defaults:
cloud/usage-type: ''
inputs:
- cloud/vendor: aws
cloud/service: s3
- cloud/vendor: aws
cloud/service: s3
cloud/usage-type: RRS
Whoop that's great @mgriffin-scottlogic !
Who
Sponsor: @jawache (GSF) Product owner: @jmcook1186 (GSF) Leads: @navveenb (Accenture), @srini1978 (Microsoft)
Overview
We had a good fortnight shipping some substantial updates to IF. We're especially happy to have
if-diff
merged with its advanced matching criteria fully implemented, a newcsv-lookup
generic builtin, major improvements to our logging that make it easier to diagnose and fix errors, and a big refactor of the SCI plugin code that makes it much easier to use.@jawache and @jmcook1186 have also been scoping out and defining the next epic - you can join in the conversation on the discussion forum:
Updates
Here's the details for our most significant updates this fortnight:
Finished migrating if-plugins to
builtins
: The final items to migrate were:Simplified the SCI logic
Merged
if-diff
with the advanced matching criteria fully implementedMerged a new generic
interpolation
builtin featureBuilt a csv-lookup builtin feature
Fixed some minor issues with the release process and our CI/CD
Added a teads-curve manifest that uses only generic builtins
Added enhanced execution logging to the IF with a
--debug
commandIssues
We didn't really encounter any major issues, we were overall quite happy chugging through our latest epic!
Outlook
684
636
600
593
760