While onboarding Yaris, we observed a few very confusing issues with capturing CPython STDIO/print() activity in some Python "learning scripts." I've reproduced some of them myself locally just now so I'll share and describe the behavior below.
While this is unlikely to show up in say large National Lab code workflows, I can certainly see how it could cause a great deal of confusion while trying to learn to use the darshan runtime monitoring through to the HTML report workflow parts of the ecosystem/project.
First, let's try the first exercise I suggest, just print on two ranks, hard to imagine something simpler that is MPI-aware:
# example code from Yaris' exercise
import time
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
print("rank 0", flush=True)
if rank == 1:
print("rank 1", flush=True)
And the report shows (even with flush=True, on my machine), a lack of captured IO data.
If I increase the amount of data printed, there is still no capture of IO (same red text on the report):
# example code from Yaris' exercise
import time
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
print("rank 0" * 100, flush=True)
if rank == 1:
print("rank 1" * 100, flush=True)
Even if I add a few seconds of sleep after, the HTML report still indicates no IO capture:
# example code from Yaris' exercise
import time
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
print("rank 0" * 100, flush=True)
time.sleep(5)
if rank == 1:
print("rank 1" * 100, flush=True)
time.sleep(5)
If I switch to explicit POSIX by writing to a file, all is good in the world again:
# example code from Yaris' exercise
import time
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
with open(f"{rank}.txt", "w") as outfile:
outfile.write("hello")
if rank == 1:
with open(f"{rank}.txt", "w") as outfile:
outfile.write("hello")
We can stagger the IO with POSIX as well, which was the original purpose of the exercise, to understand IO patterns with simple examples like the one below. But, STDIO was invisible, so that made the exercise pretty confusing!
# example code from Yaris' exercise
import time
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
with open(f"{rank}.txt", "w") as outfile:
outfile.write("hello")
if rank == 1:
time.sleep(5)
with open(f"{rank}.txt", "w") as outfile:
outfile.write("hello")
While onboarding Yaris, we observed a few very confusing issues with capturing CPython
STDIO
/print()
activity in some Python "learning scripts." I've reproduced some of them myself locally just now so I'll share and describe the behavior below.While this is unlikely to show up in say large National Lab code workflows, I can certainly see how it could cause a great deal of confusion while trying to learn to use the darshan runtime monitoring through to the HTML report workflow parts of the ecosystem/project.
First, let's try the first exercise I suggest, just print on two ranks, hard to imagine something simpler that is MPI-aware:
mpirun -x LD_PRELOAD=/home/tyler/darshan_install/lib/libdarshan.so -x DARSHAN_LOGPATH=/home/tyler/LANL/rough_work/darshan/python_stagger_tests -n 2 python test.py
And the report shows (even with
flush=True
, on my machine), a lack of captured IO data.If I increase the amount of data printed, there is still no capture of IO (same red text on the report):
Even if I add a few seconds of sleep after, the HTML report still indicates no IO capture:
If I switch to explicit
POSIX
by writing to a file, all is good in the world again:We can stagger the IO with
POSIX
as well, which was the original purpose of the exercise, to understand IO patterns with simple examples like the one below. But,STDIO
was invisible, so that made the exercise pretty confusing!