Ressource Status (incl. Memory) Singleton

ax3l commented 9 years ago

Some applications such as paraview have a very nice overview of memory consumption of connected clients (see Memory Inspector).

We should add a new Singleton class that contains simple key-value pairs (name (string), e.g. Bytes _(uint64t)) for all relevant allocations we do on the device. This can be evaluated during init, e.g., when the fields are declared and set to zero (getMemory before/after) and when mallocMC is initialized to allocate it's heap (getMemory before/after).

FieldE: 600 MBytes (+ ExchangeBuffers)
FieldB: 600 MBytes (+ ExchangeBuffers)
FieldJ: 600 MBytes (+ ExchangeBuffers)
FieldTmp: 200 MBytes (+ ExchangeBuffers)
~~FieldExchangBuffers: 100 MBytes (or with original data set)~~
ParticleExchangBuffers: 300 MBytes (or with original data set? handled in mallocMC?)
mallocMC: 8 000 MBytes

-> 10.4 GB of 11.25 GB on K80 used (with ECC 6.25% of 12 GB memory are used for ECC bits).

mallocMC again: used/free MBytes, ~~maybe bytes per species (if possible/feasible)~~ (not feasible)

With that, a full overview about the total GPU memory can be provided and furthermore, mallocMC's own "getFreeMemory" calls should be in-cooperated.

The background of that is, that it is extremely hard to quantify and to predict when a GPU might run out of memory for simulations. We must allow the users to query the memory consumption as transparent as possible, since they fine-tune their particle-per-cell and cells-per-GPU inputs to the resources they have available. The feedback from the simulation to support them should be more than a vague "crashes"/"does not crash immediately"/"crashes after N steps" as it is now :)

@slizzered @psychocoderHPC that might be something for you.

erikzenker commented 8 years ago

I want to hook into this discussion because I think we could use this singleton in a more general manner. This singleton should be something like a resource status monitor which provides us with per node information such as:

Used memory (for several fields + particles)
Amount of particles
Number of cells
Number of fields
Number of particles received
Number of particles send
Last time-step run time

I would like to use these information for a load monitoring plugin which is able to dump the information on disk or transfer them to a live monitoring web application.

The evaluation of this data will show us load imbalance in our code (which could be solved manually) and will give us a good motivation to work on an automatic mechanism for better load balance.

erikzenker commented 8 years ago

I was already talking to @ax3l and he told me that there exists already some information such as the amount of particles on a node/rank. Was there other work already done with respect to my previous post ? @psychocoderHPC

ax3l commented 8 years ago

@erikzenker just for the first question, this is the particle counting per device (available PMacc one-liner)

ax3l commented 8 years ago

Number of cells: PMacc::SubGrid (see PIConGPU domain definitions)
number of field cells == number of cells times their ValueType (scalar vs. vector fields): always instanciated: FieldE (float3_X), FieldB (float3_X), FieldTmp (float1_X); careful: 2D and 3D matrices are pitched!
Number of particles received/send: needs hooking into asyncCommunication methods that do the exchanges
Last time-step run time: PIConGPU builds its MySimulation on top of the PMacc SimulationHelper interface. That itself has rudimentary timings running all the time via the PMacc TimeInterval class and could be extended to be queried

erikzenker commented 8 years ago

Thx @ax3l for providing these information

ComputationalRadiationPhysics / picongpu

Ressource Status (incl. Memory) Singleton #850