try to understand metering on the object store

alaniwi commented 3 years ago

Docs at https://caringo.atlassian.net/wiki/spaces/public/pages/2443817185/Content+Metering

For an example, see Alan H's message on the ops channel on Slack (15:10, 24 June).

alaniwi commented 3 years ago

Examples of where mc du values are greater than - or less than - values on filesystem as recorded in Ruth's CSV file.

Reading CSV file using:

sizes = {}
with open("cmip6-datasets_2020-10-27.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        sizes[row["dataset_id"]] = float(row[" size_mb"])

example where: mc du gives smaller size

mc du s3/CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr
1.2GiB  CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr

>>> sizes["CMIP6.DAMIP.CAS.FGOALS-g3.hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411"]
1753.73

example where: mc du gives larger size

mc du s3/CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr
2.3GiB  CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr

>>> sizes["CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL.historical.r13i1p1f2.Amon.va.gn.v20200205"]
1962.33

alaniwi commented 3 years ago

~/mc-du-s3.out on sci contains sizes of all the datasets as seen with mc du (look for lines ending .zarr). For each of these, may want to compare with the CSV file.

Lines can be parsed with e.g.

import re
units = {"B": 1, "KiB": 2**10, "MiB": 2**20, "GiB": 2**30, "TiB": 2**40}
pattern = '([0-9.]+)([A-Za-z]+)\s+(.*).zarr$'

for line in ........:
    m = re.match(pattern,line)
    if m:
         size = float(m.group(1)) * units[m.group(2)]
         dataset_id = m.group(3)
         # now compare with sizes[dataset_id] as shown above...

But this is about understanding mc du. It might be that the metrics shown at the above quoted URL do what we want and then maybe we don't care about size values from mc du / mc ls.

cedadev / cmip6-object-store

try to understand metering on the object store #56