darshan-hpc / darshan

Darshan I/O characterization tool
Other
56 stars 27 forks source link

TST, ENH: writing synthetic log files #452

Open tylerjereddy opened 3 years ago

tylerjereddy commented 3 years ago

The test case added in gh-447 is a prime example where it would be useful to write "synthetic data" darshan log files in the Python/pydarshan layer.

There are a few things that need to happen for this to work.

Generally: 1) check if @jakobluettgau already exposed the C function(s) related to writing darshan logs that Shane mentioned in the last meeting (in the CFFI layer). 2) if not, git grep the code base for the pertinent C functions and /or coordinate with Shane to find them and expose in CFFI layer

For the test mentioned, something like this: 1) parametrize the test over a series of NumPy random seeds (or whatever the equivalent is with the new NumPy random infrastructure) 2) generate reproducible random arrays with those seeds, and write their data into the records of the synthetic logs, probably in temporary directory 3) do a "roundtrip" unit test--basically check that all (or a random subset of) of the randomly-generated data has been retrieved unchanged by the Python/CFFI layer from the synthetic log.

This may also have use for the benchmark suite, where I have some concerns that we should at least expand to include some larger log files to get a better picture of reality (part of that could also come from the logs repo..).

jakobluettgau commented 3 years ago

I can give some quick answers.

Generally, 1/2: Currently the C defs provided to CFFI do not include darshan_log_create, darshan_log_put_* or the dxt_log_put_* families of functions. But all these are already exposed as symbols with libdarshan.so so can be added with reasonable effort. There is a fair bit of error checking etc. involved, however, which quickly get's a little verbose through CFFI.

It is also notable that there are per module put handlers in the C code base (in the darshan-<mod>-logutils.c files), that are not exposed. For the most part these are, however, thin wrappers around darshan_log_put_mod.