ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Work around crash when profiling multi-process/multi-GPU application #376

Closed benrichard-amd closed 4 months ago

benrichard-amd commented 4 months ago

Omniperf invokes rocprof to collect profiling data. rocprof has a -o option to specify an output CSV file. In the multi-GPU case, multiple processes will try to write to this file, corrupting it.

This change removes the -o option when invoking rocprof. Instead, Omniperf scans the output directory and combines the multiple CSV files into one CSV file.

Note: this only avoids the crash that occurs when using rocprofv2.

coleramos425 commented 4 months ago

@benrichard-amd in the multi-gpu case, can you confirm that we don't see any irregularities in csv output?

Handling Indexes: By default, pd. concat preserves the original indexes of the DataFrames or Series being concatenated. This can lead to duplicate index values, which might cause issues in subsequent data operations. You can use the ignore_index=True parameter to reset the index in the resulting DataFrame.

I'm not convinced that in the multi-gpu case we'll output a different set of counters in each run, thus I'm curious how this implementation handles merging overlapping output files.

coleramos425 commented 4 months ago

Additionally, could you add a sign-off on these commits? Information on how to do so can be found under the "DCO" action attached to this PR: image

benrichard-amd commented 4 months ago

@benrichard-amd in the multi-gpu case, can you confirm that we don't see any irregularities in csv output?

Handling Indexes: By default, pd. concat preserves the original indexes of the DataFrames or Series being concatenated. This can lead to duplicate index values, which might cause issues in subsequent data operations. You can use the ignore_index=True parameter to reset the index in the resulting DataFrame.

I'm not convinced that in the multi-gpu case we'll output a different set of counters in each run, thus I'm curious how this implementation handles merging overlapping output files.

Good catch. Looking more closely we were seeing the unnamed index column in the output. This has been fixed.

The output CSVs look good and have correct indexing.