Green-Software-Foundation / if

Impact Framework
https://if.greensoftware.foundation/
MIT License
138 stars 40 forks source link

Need help in understanding error #900

Closed adamhafiz427 closed 1 month ago

adamhafiz427 commented 1 month ago

Hello, this is not an actual bug per say, but I need help in understanding this error I get from using the csv-lookup plugin

Description of the Error

I'm trying to get the TDP of azure instances using the CSVLookup plugin. This error is thrown:

error: Error happened while parsing given CSV file: https://raw.githubusercontent.com/Green-Software-Foundation/if-data/main/cloud-metdata-azure-instances.csv InputValidationError: One or more of the given query parameters are not found in the target CSV file column headers.

Expected Behaviour

Instance class is taken, the value is looked up upon in the CSV and the corresponding TDP should be returned.

Actual Behaviour

Error states that query parameters (only the "instance-class" column in the CSV) is not found in the CSV file. Here is the link to the CSV: https://raw.githubusercontent.com/Green-Software-Foundation/if-data/main/cloud-metdata-azure-instances.csv

Steps to Reproduce

Run the IF normally with a manifest similar to the one below

Manifest File That Generated the Error

I can't provide the entire manifest file, but here is the "initialize" part

initialize:
  plugins:
    cloud-metadata:
      method: CSVLookup
      path: builtin
      global-config:
        filepath: https://raw.githubusercontent.com/Green-Software-Foundation/if-data/main/cloud-metdata-azure-instances.csv
        query:
          instance-class: cloud/instance-type
        output:
        - cpu-tdp
        - cpu/thermal-design-power
    teads-curve:
      method: TeadsCurve
      path: '@grnsft/if-unofficial-plugins'
    e-mem:
      method: EMem
      path: '@grnsft/if-plugins'
      global-config:
        energy-per-gb: 0.000392
    sci-e:
      method: SciE
      path: '@grnsft/if-plugins'
    sci-m:
      method: SciM
      path: '@grnsft/if-plugins'
    sci-o:
      method: SciO
      path: '@grnsft/if-plugins'
  outputs:
  - yaml

And here are the instance-classes of my inputs:

Standard_B2as_v2
Standard_B2ms
Standard_D2s_v3
Standard_B1s
Standard_DS2_v2

They were all present in the instance-class column of the azure instances CSV.

Runtime Info

package.json: { "dependencies": { "@grnsft/if": "0.5.0", "ts-node": "10.9.1", "luxon": "3.4.4" } }

adamhafiz427 commented 1 month ago

Missed a line the last time I corrected the dataset: https://github.com/Green-Software-Foundation/if-data/pull/2 , this might solve my issue

jmcook1186 commented 1 month ago

@adamhafiz427 Did you solve it?

adamhafiz427 commented 1 month ago

@jmcook1186 only partially. The problem still persists because there are some Azure VM instance series that are not in the dataset. For example, DSv2 series 11-15 or the constrained vCPU sizes

jmcook1186 commented 1 month ago

Looks like you are querying using two column names: cpu-tdp and cpu-thermal-design-power. Only the first one is present in the target csv file. Maybe you are actually trying to grab the former and rename it to the latter in the manifest file? Check the docs here for instructions for how to configure this correctly: https://github.com/Green-Software-Foundation/if/blob/main/src/if-run/builtins/csv-lookup/README.md

adamhafiz427 commented 1 month ago

Yes, I'm trying to grab and rename the cpu-tdp column as cpu/thermal-design-power as in this example:

The plugin also supports data renaming. This means you can grab data from a named column but push it into your manifest file data under another name, for example, maybe we want to grab data from the `processor-name` column int he target csv and add it to the manifest file data as `processor-id` because this is the name expected by some other plugin in your piepline. You can do this by passing comma separated values in arrays. 

```yaml
output:
  ["processor-name": "processor-id"]

but my IDE formats it into a list form like in my manifest. I don't think this is the problem as when I put a fake column name, the error thrown is different:

Error happened while parsing given CSV file: https://raw.githubusercontent.com/adamhafiz427/if-data/main/cloud-metdata-azure-instances.csv InputValidationError: There is no column with name: fake-column-name.

I'm think my problem comes from certain azure instance classes not being available in the dataset, such as Standard_E64-16ds_v5 from constrained vCPU sizes

Are there any plans to update the dataset in the near future ?

jmcook1186 commented 1 month ago

ah I see, yes I think you are right. We'd love for the dataset to get updated - it's unlikely to get addressed in the next few weeks to be honest as we have lots of higher priority tasks, but I'll happily make an issue for it and maybe someone from the community will pick it up in the meantime.

adamhafiz427 commented 1 month ago

Yes that would be great! In the PR you merged into the data repo I have added the entries of some instance families, I will continue to update with the data that I need in the meantime

jmcook1186 commented 1 month ago

Closing as complete - thanks @adamhafiz427