charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
203 stars 49 forks source link

CkIO Reading Capabilities #3788

Closed mjacob1002 closed 6 months ago

mjacob1002 commented 6 months ago

This PR will add high performance reading capabilities to CkIO. These performance gains have been verified through experiments on Bridges2. It is a session-based capability, where a user opens a read session, in which data is then read in and the user can then ask for such data through a Ck::IO::read call, where they specify both an offset and amount of data to read.

Will add a well-documented example, as well as readthedocs, later.

mayantaylor commented 6 months ago

Commit [6bf3857] will make small implementation adjustment to better support concurrent background work:

The previous implementation monitors the status of a buffer chare's read in sendData (which is called every time a client requests data from a buffer chare). To avoid spinning on the future's value and hogging resources, sendData is a threaded method and our implementation yields the waiting thread. If # clients >> # buffer chares, this implementation does not work as intended, because an excessive number of threads are created and suspended for a single read future.

The new implementation resolves this issue by spawning only 1 thread per buffer chare to monitor the future status. We use SDAG to buffer sendData requests as they arrive and let them through once the monitoring thread reports that the future is ready.

Projections Analysis

The following projections screenshots show this issue in more detail. These results were produced on Bridges2, on 1 node with 2 cores, using non-SMP mode. I used 1 buffer chare and 2048 clients reading 1 GB for these preliminary results. The green denotes dummy background work and the light red denotes time spent in sendData.

The old implementation shows that handling the large number of threads occupies an entire core, blocking any background work that could use that resource:

Screenshot 2024-03-18 at 9 41 52 AM

Projections shows that the new implementation does much better with concurrent background work. In this implementation, only one thread/buffer chare is waiting on the future:

Screenshot 2024-03-18 at 9 45 00 AM

Note: I am still verifying the performance of this new implementation in the simple test case with no background work.

trquinn commented 5 months ago

This commit undid a fix (work around, actually) committed in PR #3765. ChaNGa now crashes after using CkIO.