PPPLDeepLearning / plasma-python

PPPL deep learning disruption prediction package
http://tigress-web.princeton.edu/~alexeys/docs-web/html/
79 stars 43 forks source link

Store computational throughput and latency figures in repository #59

Open felker opened 4 years ago

felker commented 4 years ago

Related to #58, #52, and #51.

We should to add a continually-updated record of the examples/second, second/batch, and other statistics discussed in #51 to a new file docs/Benchmarking.md (or ComputationalEfficiency.md, etc.).

AFAIK, neither Kates-Harbeck et al (2019) or Svyatkovskiy (2017) discussed single-node or single GPU computational efficiency, since they focused on the scaling of multi-node parallelism (CUDA-aware MPI).

Given that we have multiple active users of the software distributed across the country (world?), it would be good for collaboration to provide easily-accessible metrics of performance expectations. The absence of these figures has already caused some confusion when we got access to V100 GPUs on the Princeton Traverse cluster.

We need to establish a benchmark or set of benchmarks for FRNN in order to measure and communicate consistent and useful metrics. E.g. we could store measurements from only a single benchmark consisting of 0D and 1D d3d signal data with our LSTM architecture on a single GPU/device with batch_size=256. Then, a user would have to extrapolate the examples/second to the simpler network but the longer average pulse lengths on JET if using jet_data_0d.

The conf.yaml configuration choices that have first-order effects on performance include:

Similar to #41, these figures will be useless in the long run unless we store details of their context, including:

Summary of hardware we have/had/will have access to for computational performance measurements:

Even when hardware is retired (e.g. OLCF Titan), it would be good to keep those figures for posterity.