Open avolkov1 opened 1 year ago
Fair ask. Might need some work though to format things properly and make them easily accessible without making the file too bloated. I'll think about it.
I would like to second this request. I have automation working for running NCCL tests, but writing a text parser from this output is complicated and brittle. JSON or CSV would provide me more confidence that the output automation won't break in subsequent versions.
I already have a working fork, @sjeaugey are you already done with this or can clean up my fork and open PR here?
Sorry, I've been underwater and couldn't make progress on this. If you have a patch to propose, feel free to do so. I can't guarantee we'll find time to integrate but if it merges easily and it's a small patch, maybe we can do it.
@lipovsek-aws please open a PR.
My preference for a format would be a CSV file structured around independent and dependent variables, with one row per unique tuple of indepedent variables. For nccl-tests the independent vars are: operation, bytes, elements, iters, redop, datatype, etc.... The dependent vars are the measurements: usecs, algbw, busbw, wrongs. The first row of each table would begin with a #
and follow with a list of variable names. All rows until the next #
row would be parsed as the values for their ordinal respective variable. A #
row containing an unrecognized variable name would be invalid. Example.
# operation, bytes, elements, iters, redop, datatype, usecs
allreduce, 1024, 256, 20, sum, float, 101
allreduce, 2048, 256, 20, sum, float, 203
# operation, bytes, elements, iters, datatype, usecs, busbw
broadcast, 1024, 256, 20, float, 41, 1.1e0
broadcast, 2048, 256, 20, float, 103, 1.8e0
If you have other thoughts for formats that are easier to work with please share. I look forward to the PR.
@jbachan Can you elaborate on why you wouldn't want to simply convert the current columns to CSV columns?
Extracting a particular column from a CSV is usually quite painless using tools like cut
, while extracting the same from your proposition would be more complicated.
Besides, mimicking the current columns would make the patch quite limited, while changing the layout would probably mean more complexity.
I agree just slipping commas between the columns would be the smallest patch.
Ignoring minimality of patch length, my preference would be for a format that is "concatenative" meaning that we can direct the output of a battery of tests all to one file:
all_reduce_perf -dfloat -osum >> rows
all_reduce_perf -dint8 -osum >> rows
all_gather_perf >> rows
This require that each row of measurement be fully described: all independent variables affecting the measurement row are easy to parse.
The format should also allow it easy for a later version of nccl-tests
to add independent variables without breaking old sripts. For instance, if it occurs to us that adding "nranks" to the independent vars is a good idea, then feeding new output including that var to an old script not expecting that var should still work. This would just require the old script to ignore vars it doesn't know about. The following python script supports such forward compatability:
cols = [] # list of strs
rows = [] # list of dicts mapping var names to vals
for line in readlines():
if line.startswith('#'):
cols = [x.strip() for x in line[1:].split(',')]
else:
vals = [x.strip() for x in line.split(',')]
rows.append(dict(zip(cols, vals)))
# build a list of (x,y) from all_gather where x='size' y='busbw'
xys = [(float(row['size']), float(row['busbw'])) for row in rows if row['test']=='all_reduce']
My CSV format meets the concatenative and compatability requirements. If there exists similar formats that can do the same, or cool tricks with cut
that obviate their need then I would love to learn about them.
Hi, sorry for delayed response. I added flag "--file" for example "--file=/tmp/results" and then persisted current output to "/tmp/results.csv" and arguments to "/tmp/results.json" - that makes it fully reproducible and machine readable. I added in_place and out_of_place to column names since CSV isn't the best with multi level columns, but I'm more than happy to move it back to multi level if others find this better DX. I should be able to open PR end of next week. @jbachan I think we can also add all parameters/outputs in each row, my goal was to separate results of the NCCL test with metadata, I could also just put it in CSV file and replicate parameters (algorithm, min/max packet size) over all rows.
Much appreciated @lipovsek-aws
Thoughts:
I'm on board with dropping JSON and putting all data in CSV, data quantity should be low so data duplication is not a concern for me. I'll open PR late next week.
Lets continue conversation in https://github.com/NVIDIA/nccl-tests/pull/138 , @sjeaugey @jbachan please review.
Feature request to add an option to print results in a csv format table or json.