Closed ellosel closed 3 months ago
In general, I think the changes are good to help reduce the build time. Most of the code movement makes sense!
I would encourage checking a few more things while we are at it:
- Are the rocBLAS builds compatible with the changes? I think the original work was to prevent rocBLAS from building the entire tensile library multiple times (during cmake stage then during the actual build).
- If we were to change the contents of a YAML config file (say working on a library contents), would a tensile dirty build detect the changes and build accordingly with new manifest and libs copied to the correct place after? Or do you have to build the entire library from scratch again?
- Some tests appear to have been removed. Good idea to verify they are not needed anymore.
LGTM pending CI and newline EOF fix
Is
DEPENDS_EXPLICIT_ONLY
applicable to the uses ofadd_custom_command
? In theory, it would be possible to run the Tensile build in parallel with the build of the library when compiling with ninja. The linking is the only part that actually needs to be serialized, butadd_custom_command
will force the serialization of the build by default.Whether any build system has enough memory to build Tensile in parallel with the library perhaps another question, but it may be a nice option to have available.
Good find and I agree that would be nice. It looks like DEPENDS_EXPLICIT_ONLY
was made available in cmake 3.27 which isn't available on all of the supported rocm platforms e.g. ubuntu 20 is 3.16. Currently, we have cmake_minimum_required set to 3.13.
It looks like
DEPENDS_EXPLICIT_ONLY
was made available in cmake 3.27 which isn't available on all of the supported rocm platforms e.g. ubuntu 20 is 3.16. Currently, we have cmake_minimum_required set to 3.13.
One approach for those sorts of enhancements is to conditionally enable them. e.g.
set(command_options "")
if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.27)
set(command_options DEPENDS_EXPLICIT_ONLY)
endif()
add_custom_command(mycommand
${command_options}
...
)
Not sure what you are trying to build in parallel, but if using all the cores for each stage simpler to debug if steps are done in order. I would do additions in a new PR if experimenting.
The CMake support in Tensile (namely TensileConfig.cmake) is doing unnecessary work. In turn, the configure/build times in downstream projects can take up to 20% longer than is necessary. The unnecessary work is due to:
execute_process
to invokeTensileCreateLibrary --generate-manifest-and-exit
which is eagerly run at configuration time regardless if manifest generation is required.add_custom_command
call that is used to generate Tensile build artifacts.In theory, it is reasonable to use the contents of the manifest as output for a subsequent
add_custom_command
call because this will ensure that the command is not considered satisfied unless all of the files in the manifest are generated. However, CMake needs to know what those files are at configuration time when it is configuring the custom command. Therefore, one must useexecute_process
to generate the manifest which has the consequence of redundantly calling toTensileCreateLibrary --generate-manifest-and-exit
any time a client reconfigures whether or not it is necessary (andTensileCreateLibrary --generate-manifest-and-exit
is expensive).In practice, we would like to invoke
TensileCreateLibrary
once at build time prior to building dependent targets. By relaxing the output requirements of the custom command to just the manifest file. We can eliminate theexecute_process
call altogether. Further, we can provide an analog to checking that the custom command is satisfied (i.e. that it generated all of the expected files) by extendingTensileCreateLibrary
to enforce said check.This PR includes the necessary changes to avoid the redundant work described above. Further, minor changes to verbosity related logic in
TensileCreateLibrary
were added to ensure users can turn off all output from Tensile.The current default is set to turn off most of the Tensile output which gives the following output when configuring/building in the client (note the output at 0% which comes from the custom command in TensileConfig):
Users can reconfigure with
-DTensile_VERBOSE=1
or-DTensile_VERBOSE=2
to get extra tensile output.Initial testing includes:
Running:
tox -e lint
Running:
tox -v -e py310 -- ./Tensile/Tests -m pre_checkin
Running:
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=hipcc -DTensile_CPU_THREADS=32 -DTensile_ROOT=$(pwd)/Tensile -S HostLibraryTests -B build; cmake --build build; ./build/TensileTests
Integration test with rocBLAS
./install.sh -c -t <path to tensile including these changes>
Running:
./rocblas-test --gtest_filter=*quick*:*pre_checkin*-*known_bug*