For both jax.profiler (profiler=xplane in maxtext) and a GPU nsys profiler (profiler=nsys in maxtext) we upload the profile to the base_output_directory (source)
Typically this directory is GCS, it can also be local. However for the nsys profiler we hardcode the uploader to use gsutil source, which has two problems
Output directory may not be GCS, so gsutil is not applicable
Hosts may not have gsutil installed, since gsutil is not in requirements.txt
We should modify the nsys profile upload to work in all cases.
For both
jax.profiler
(profiler=xplane
in maxtext) and a GPU nsys profiler (profiler=nsys
in maxtext) we upload the profile to thebase_output_directory
(source)Typically this directory is GCS, it can also be local. However for the nsys profiler we hardcode the uploader to use gsutil source, which has two problems
We should modify the nsys profile upload to work in all cases.
Additional context - https://github.com/AI-Hypercomputer/maxtext/pull/909 was added as a temporary fix for 2 - we won't upload the profile when gsutil is missing, so training may continue