argonne-lcf / dlio_benchmark

An I/O benchmark for deep Learning applications
https://dlio-benchmark.readthedocs.io
Apache License 2.0
70 stars 30 forks source link

Add user config to specify type of distribution of time configuration #241

Open rayandrew opened 3 weeks ago

rayandrew commented 3 weeks ago

Hi @zhenghh04 and @hariharan-devarajan,

This PR adds options for users to specify the type of distribution for time metrics. This is useful when computation time, preprocessing time, or evaluation time doesn’t follow a normal distribution.

The existing API (_time and _time_stdev) is fully compatible, so no changes are needed in the configuration – it will default to assuming a normal distribution.

The configuration can be specified as nested yaml as follow

# ===== normal =====
computation_time:
  mean: 1.0
  stdev: 0.1
  type: normal
# or
computation_time:
   mean: 1.0
# or
computation_time:
   mean: 1.0
   stdev: 0.1
# or (OLD API)
computation_time: 1.0
computation_time_stdev: 0.1
# or (OLD API)
computation_time: 1.0

# ===== uniform =====
computation_time:
   min: 0.5
   max: 1.5
   type: uniform

# ===== gamma =====
computation_time:
   shape: 1.0
   scale: 1.0
   type: gamma

# ===== exponential =====
computation_time:
   scale: 1.0
   type: exponential

# ===== poisson =====
computation_time:
   lam: 1.0
   type: poisson

Similar configuration applies to preprocess_time and eval_time

Thank you!