GreenAlgorithms / GreenAlgorithms4HPC

http://www.green-algorithms.org
48 stars 7 forks source link

Support for partitions which have already been deleted? #3

Closed TheSeparatrix closed 2 years ago

TheSeparatrix commented 2 years ago

Hi!

I watched your talk at CMIH last week and thought I would like to try out your HPC tool. I am also using the Cambridge CSD3 cluster. I have followed your instructions and ran myCarbonFootprint.sh. It produces this error:

Traceback (most recent call last):
  File "GreenAlgorithms_global.py", line 422, in <module>
    main(args, cluster_info, fParams)
  File "GreenAlgorithms_global.py", line 341, in main
    WM.clean_logs_df()
  File "/home/[user]/GreenAlgorithms4HPC/GreenAlgorithms_workloadManager.py", line 361, in clean_logs_df
    self.df_agg['PartitionTypeX'] = self.df_agg.PartitionX.apply(self.set_partitionType)
  File "/home/[user]/GreenAlgorithms4HPC/GA_env/lib/python3.7/site-packages/pandas/core/series.py", line 4138, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
  File "/home/[user]/GreenAlgorithms4HPC/GreenAlgorithms_workloadManager.py", line 104, in set_partitionType
    assert  x in self.cluster_info['partitions'], f"\n-!- Unknown partition: {x} -!-\n"
AssertionError:
-!- Unknown partition: pascal -!-

I believe the pascal partition has been discontinued by the CSD3 cluster team and has now been replaced by the ampere partition. I have used both. Is this what might be creating the problem?

Llannelongue commented 2 years ago

Hi! You can include as many partitions as you need in the cluster_info.yaml file, so typically it's best never to delete partitions, just add new ones. Here is the file I use for Cambridge's CSD3.

cluster_name: "CSD3"
granularity_memory_request: 6 # in GB
partitions:
  skylake:
    type: CPU # CPU or GPU
    model: "Xeon Gold 6142"
    TDP: 9.4 # in W, per core
  skylake-himem:
    type: CPU
    model: "Xeon Gold 6142"
    TDP: 9.4
  cclake: # from November 2020
    type: CPU
    mmodel: "Xeon Platinum 8276" # from HPC team
    TDP: 5.9 # from https://ark.intel.com/content/www/us/en/ark/products/192470/intel-xeon-platinum-8276-processor-38-5m-cache-2-20-ghz.html
  cclake-himem:
    type: CPU
    model: "Xeon Platinum 8276" # same as above
    TDP: 5.9
  icelake:
    type: CPU
    mmodel: "Xeon Platinum 8368Q" # from HPC team
    TDP: 7.1 # from https://www.intel.com/content/www/us/en/products/sku/212289/intel-xeon-platinum-8368q-processor-57m-cache-2-60-ghz/specifications.html
  icelake-himem:
    type: CPU
    model: "Xeon Platinum 8368Q" # same as above
    TDP: 7.1
  ampere:
    type: GPU
    model: "NVIDIA A100-SXM-80GB GPUs" # from https://docs.hpc.cam.ac.uk/hpc/user-guide/a100.html
    TDP: 300 # from https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/PB-10577-001_v02.pdf
    CPU_model: "AMD EPYC 7763" # from HPC team
    TDP_CPU: 4.4 # from https://www.amd.com/fr/products/cpu/amd-epyc-7763
  cardio:
    type: CPU
    model: "Xeon E5-2660 v3" # from data manager
    TDP: 10.5 # from https://ark.intel.com/content/www/us/en/ark/products/81706/intel-xeon-processor-e52660-v3-25m-cache-2-60-ghz.html
  pascal:
    type: GPU
    model: Tesla P100 PCIe # from HPC team
    TDP: 250 # from https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-p100/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf
    model_CPU: "Xeon E5-2650 v4" # from HPC team
    TDP_CPU: 8.8 # from https://ark.intel.com/content/www/us/en/ark/products/91767/intel-xeon-processor-e52650-v4-30m-cache-2-20-ghz.html
PUE: 1.15 # > 1
CI: 231.12 # carbon intensity, in gCO2e/kWh (2022 value)
TheSeparatrix commented 2 years ago

Thank you! Managed to fill it in now!