darshan-hpc / darshan

Darshan I/O characterization tool
Other
56 stars 27 forks source link

BUG: darshan-runtime and Python STDIO/print captures #933

Open tylerjereddy opened 1 year ago

tylerjereddy commented 1 year ago

While onboarding Yaris, we observed a few very confusing issues with capturing CPython STDIO/print() activity in some Python "learning scripts." I've reproduced some of them myself locally just now so I'll share and describe the behavior below.

While this is unlikely to show up in say large National Lab code workflows, I can certainly see how it could cause a great deal of confusion while trying to learn to use the darshan runtime monitoring through to the HTML report workflow parts of the ecosystem/project.

First, let's try the first exercise I suggest, just print on two ranks, hard to imagine something simpler that is MPI-aware:

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    print("rank 0", flush=True)
if rank == 1:
    print("rank 1", flush=True)

mpirun -x LD_PRELOAD=/home/tyler/darshan_install/lib/libdarshan.so -x DARSHAN_LOGPATH=/home/tyler/LANL/rough_work/darshan/python_stagger_tests -n 2 python test.py

And the report shows (even with flush=True, on my machine), a lack of captured IO data.

image

If I increase the amount of data printed, there is still no capture of IO (same red text on the report):

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    print("rank 0" * 100, flush=True)
if rank == 1:
    print("rank 1" * 100, flush=True)

Even if I add a few seconds of sleep after, the HTML report still indicates no IO capture:

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    print("rank 0" * 100, flush=True)
    time.sleep(5)
if rank == 1:
    print("rank 1" * 100, flush=True)
    time.sleep(5)

If I switch to explicit POSIX by writing to a file, all is good in the world again:

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")
if rank == 1:
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")

image

We can stagger the IO with POSIX as well, which was the original purpose of the exercise, to understand IO patterns with simple examples like the one below. But, STDIO was invisible, so that made the exercise pretty confusing!

# example code from Yaris' exercise

import time
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")
if rank == 1:
    time.sleep(5)
    with open(f"{rank}.txt", "w") as outfile:
        outfile.write("hello")

image