Open alaniwi opened 3 years ago
Examples of where mc du
values are greater than - or less than - values on filesystem as recorded in Ruth's CSV file.
Reading CSV file using:
sizes = {}
with open("cmip6-datasets_2020-10-27.csv") as f:
reader = csv.DictReader(f)
for row in reader:
sizes[row["dataset_id"]] = float(row[" size_mb"])
mc du
gives smaller sizemc du s3/CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr
1.2GiB CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr
>>> sizes["CMIP6.DAMIP.CAS.FGOALS-g3.hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411"]
1753.73
mc du
gives larger size
mc du s3/CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr
2.3GiB CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr
>>> sizes["CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL.historical.r13i1p1f2.Amon.va.gn.v20200205"]
1962.33
~/mc-du-s3.out
on sci contains sizes of all the datasets as seen with mc du
(look for lines ending .zarr
). For each of these, may want to compare with the CSV file.
Lines can be parsed with e.g.
import re
units = {"B": 1, "KiB": 2**10, "MiB": 2**20, "GiB": 2**30, "TiB": 2**40}
pattern = '([0-9.]+)([A-Za-z]+)\s+(.*).zarr$'
for line in ........:
m = re.match(pattern,line)
if m:
size = float(m.group(1)) * units[m.group(2)]
dataset_id = m.group(3)
# now compare with sizes[dataset_id] as shown above...
But this is about understanding mc du
. It might be that the metrics shown at the above quoted URL do what we want and then maybe we don't care about size values from mc du
/ mc ls
.
Docs at https://caringo.atlassian.net/wiki/spaces/public/pages/2443817185/Content+Metering
For an example, see Alan H's message on the ops channel on Slack (15:10, 24 June).