Open ax3l opened 9 years ago
I want to hook into this discussion because I think we could use this singleton in a more general manner. This singleton should be something like a resource status monitor which provides us with per node information such as:
I would like to use these information for a load monitoring plugin which is able to dump the information on disk or transfer them to a live monitoring web application.
The evaluation of this data will show us load imbalance in our code (which could be solved manually) and will give us a good motivation to work on an automatic mechanism for better load balance.
I was already talking to @ax3l and he told me that there exists already some information such as the amount of particles on a node/rank. Was there other work already done with respect to my previous post ? @psychocoderHPC
@erikzenker just for the first question, this is the particle counting per device (available PMacc one-liner)
PMacc::SubGrid
(see PIConGPU domain definitions)ValueType
(scalar vs. vector fields): always instanciated: FieldE
(float3_X), FieldB
(float3_X), FieldTmp
(float1_X); careful: 2D and 3D matrices are pitched!asyncCommunication
methods that do the exchangesMySimulation
on top of the PMacc SimulationHelper
interface. That itself has rudimentary timings running all the time via the PMacc TimeInterval
class and could be extended to be queriedThx @ax3l for providing these information
Some applications such as paraview have a very nice overview of memory consumption of connected clients (see Memory Inspector).
We should add a new Singleton class that contains simple key-value pairs (name (string), e.g. Bytes _(uint64t)) for all relevant allocations we do on the device. This can be evaluated during init, e.g., when the fields are declared and set to zero (getMemory before/after) and when
mallocMC
is initialized to allocate it's heap (getMemory before/after).FieldE
: 600 MBytes (+ ExchangeBuffers)FieldB
: 600 MBytes (+ ExchangeBuffers)FieldJ
: 600 MBytes (+ ExchangeBuffers)FieldTmp
: 200 MBytes (+ ExchangeBuffers)FieldExchangBuffers
: 100 MBytes (or with original data set)ParticleExchangBuffers
: 300 MBytes (or with original data set? handled in mallocMC?)mallocMC
: 8 000 MBytes-> 10.4 GB of 11.25 GB on K80 used (with ECC 6.25% of 12 GB memory are used for ECC bits).
mallocMC again: used/free MBytes,
maybe bytes per species (if possible/feasible)(not feasible)With that, a full overview about the total GPU memory can be provided and furthermore, mallocMC's own "getFreeMemory" calls should be in-cooperated.
The background of that is, that it is extremely hard to quantify and to predict when a GPU might run out of memory for simulations. We must allow the users to query the memory consumption as transparent as possible, since they fine-tune their particle-per-cell and cells-per-GPU inputs to the resources they have available. The feedback from the simulation to support them should be more than a vague "crashes"/"does not crash immediately"/"crashes after N steps" as it is now :)
@slizzered @psychocoderHPC that might be something for you.