Use data formats for data

LupoA / lsdensities

Smeared spectral densities from lattice correlators

GNU General Public License v3.0

3 stars 0 forks source link

Use data formats for data #33

Open edbennett opened 6 months ago

edbennett commented 6 months ago

I've only taken a 10,000 foot view so far, but it looks as though some results are only output by the code as text in a log file.

Parsing out data from a free-form log file is annoying and error-prone; I'd recommend that any results that want to be output from a program are output to an appropriate data file format—this might be CSV, JSON, or HDF5. (Probably not HDF5 for the sizes of data here.) Potentially they could still be output with logging.info in case anyone wants to keep an eye on a run while debugging.

LupoA commented 5 months ago

I'll try to deal with this but in case Niccolo' gets here before I do, my idea is to use json more or less like this:

big_dump = {
        "inputs": inputs,
        "some_result": the_result.tolist()
    }

    title_file = str(some_info) + ".json"

    with open(title_file, "w") as json_file:
        json.dump(big_dump, json_file, indent=4)

so that any result (rho(E), rho(lambda), smearing_kernel(E) ) is printed together with the inputs

edbennett commented 5 months ago

I'd suggest trying to keep as much metadata and provenance in the each output file as can reasonably be done.

Things that should be relatively easy to get

Filename of current wrapper script
Version information of package (either in x.y.z or as a commit ID plus timestamp of commit)
Timestamp (including time zone information; UTC may be best) of computation
Username of current user
Hostname of current machine

If possible, getting the identifiers and attached provenance information of any input data and passing that through would be the gold standard, but that's harder to do when it's not present on the input.