I just noticed a problem with the other implementations after wondering why the timing spreadsheet didn't have a column for number of files.
According to the MATLAB reference implementations (both the serial and parallel versions), the number of files is a tunable parameter of the benchmark, such that each worker can have multiple files assigned to it (this was also something Jeremy mentioned when explaining the benchmark in our original meeting).
For example, in kernel 0, if I have n processes, and m files (where m % n == 0), then each process should write out m / n files, and each file should contain edges_per_vertex * 2^scale / m edges. At least, this is what the parallel version of the reference MATLAB code does. There might actually be a bug there - it doesn't strike me as correct that the global file count is used to calculate the number of edges per file instead of the local file count...it seems like a mistake that was made when they translated the serial version into the parallel version. In other words, I think the correct the number of edges per file might actually be edges_per_vertex * 2^scale / (m / n).
Regardless of this detail, though, multiple files per worker is definitely allowed. I'll email Jeremy about the potential bug.
Another thing I saw missing from the kernel 0 implementations here is that they don't all properly seed the RNG with the file index. This probably isn't that important, but just thought I'd point it out.
Anyway, for a Julia example of the correct behavior - well, hopefully it's correct 😛 - you can read my MPI implementation.
I just noticed a problem with the other implementations after wondering why the timing spreadsheet didn't have a column for number of files.
According to the MATLAB reference implementations (both the serial and parallel versions), the number of files is a tunable parameter of the benchmark, such that each worker can have multiple files assigned to it (this was also something Jeremy mentioned when explaining the benchmark in our original meeting).
For example, in kernel 0, if I have
n
processes, andm
files (wherem % n == 0
), then each process should write outm / n
files, and each file should containedges_per_vertex * 2^scale / m
edges. At least, this is what the parallel version of the reference MATLAB code does. There might actually be a bug there - it doesn't strike me as correct that the global file count is used to calculate the number of edges per file instead of the local file count...it seems like a mistake that was made when they translated the serial version into the parallel version. In other words, I think the correct the number of edges per file might actually beedges_per_vertex * 2^scale / (m / n)
.Regardless of this detail, though, multiple files per worker is definitely allowed. I'll email Jeremy about the potential bug.
Another thing I saw missing from the kernel 0 implementations here is that they don't all properly seed the RNG with the file index. This probably isn't that important, but just thought I'd point it out.
Anyway, for a Julia example of the correct behavior - well, hopefully it's correct 😛 - you can read my MPI implementation.