argonne-lcf / dlio_benchmark

An I/O benchmark for deep Learning applications
https://dlio-benchmark.readthedocs.io
Apache License 2.0
65 stars 30 forks source link

Looks like the benchmark is not calculating client memory sizes correctly in the datasize command for scale out client testing #193

Open danlchilton opened 5 months ago

danlchilton commented 5 months ago

We have each ubuntu 22.04 client configured with the same amount of RAM. These are virtual machines deployed on Nutanix AOS stack.

[INFO] Total amount of data each host will consume is 1092.259794473648 GB; each host has [377.2758369445801, 377.275821685791, 377.2759017944336, 188.63793563842773, 0.0, 0.0, 0.0] GB memory [/home/ubuntu/storage/dlio_benchmark/dlio_benchmark/utils/statscounter.py:121]

see this only when we have more than 1 gpus per client, in this case 2. workload: unet3d version: git clone -b rc2-fix --recurse-submodules https://github.com/mlcommons/storage.git