Closed soblin closed 1 year ago
@soblin Thank you for creating this issue. :+1:
I generally agree with your thoughts. For specific changes, it would be great to split tasks into small parts and discuss them with the package maintainers in PRs. The reason is, I think that the following are usually just a trade-off between portability/maintainability and compilation time, in an extreme argument.
inline
)extern template
or generate functions every timeAlso, talking about the history of why the implementations are so now, in the early stage of Autoware Core/Universe, we wrote many functions in header files considering the maintenance costs (we didn't want to write almost the same thing twice). However, since the functions are relatively stable lately, and since I know the compilation time is increasing, I think we can apply some optimization techniques that you proposed now.
Although I wrote a lot, my conclusion is "A super nice proposal! Let's make Autoware extremely faster!"
@kenji-miyake That's right, the portability/maintainability should be cautiously cared if I'm going to make PRs. And as a developer I totally agree that in a development phase we do not need to (or afford to actually haha) pay so much attention to things like compilation time, code quality, etc.
But yeah as I mentioned, I'll first get some statistics/evidence on compilation time and think about the room for improvements. Also it would be useful to make the analysis method a reusable one for future continuous integration.
This branch can be compiled by clang and Ninja using following command
colcon build --symlink-install --continue-on-error --event-handlers console_cohesion+ \
--cmake-args " -GNinja" -DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang \
-DCUDA_NVCC_FLAGS="-allow-unsupported-compiler" \
-DCMAKE_CXX_FLAGS="-ftime-trace"
This produces build/behavior_velocity_planner/.ninja_log file, which can be converted to trace.json using ninjatracing
./ninjatracing -e --showall build/behavior_velocity_planner/.ninja_log \
> trace.json
We can visualize this file on chrome://tracing
.
@soblin
Could you consider pushing the fix you have made to compile with clang aswell? I know everyone compiles Autoware with gcc, but clang provides many interesting complementary diagnostic tools (such as -ftime-trace
).
Moreover, it helps to stick closer to the C++ standard: if gcc accepts something clang does not, most likely it is a gcc only thing, not standard C++ (like the nullptr_t
-> std::nullptr_t
)
I don't advocate for an official support of clang, but rather to keep the code base compiler independant as much as we can.
@VRichardJP
In terms of the code quality I totally agree with it ! In my branch I needed to relax the compile option to -Wno-everything
so there should be a lot rooms.
For Clang support, it might be good to start with for example adding a daily-scheduled workflow .github/workflows/build-and-test-clang.yaml
. Also, it's also good to add some rules to https://autowarefoundation.github.io/autoware-documentation/main/contributing/coding-guidelines/languages/cpp/ to support both gcc
and clang
.
Notes:
For behavior_path_planner, behavior_velocity_planner, dividing the build targets into several libraries using ament_auto_add_library
increased the build time contrary to general workaround.
Weirdly these PRs adds up CMake processing time.
This maybe because of the repeated calls to ament_auto_add_library
, but it is not clear which part slows down CMake
part.
@soblin Sorry do you mean that the total build time has increased? or just cmake generation time?
@VRichardJP Yes the total build time increased, apparently due to the overhead from CMake. I have the suspicion that these lines are inhibiting parallel build of ament_auto targets.
If we divide the source files into libA, libB, and libC using ament_auto_add_library
, those lines seem to add these targets to ${PROJECT_NAME}_LIBRARIES
variable and then, the items in this variable are linked to later targets. Possibly
ament_auto_add_library
firstlyament_auto_add_library
secondly and then libA
is linked to itament_auto_add_library
finally and then libA
, libB
are linked to it.If that was the case it makes sense that each target is not processed in parallel.
ament_auto_find_build_dependencies
(which is called in autoware_cmake() internally) command populates theses variables, so I think manually building each library target using these variables can help to parallelize the build process.
@soblin
Indeed, each call to ament_auto_add_library
adds the library to ${PROJECT_NAME}_LIBRARIES
and link the previous ones... I have to confess I don't see the point.
What about using ARG_NO_TARGET_LINK_LIBRARIES
before calling ament_auto_add_library
? It seems it could disable that behavior.
behavior_velocity_planner
foreach
each scene_module library using ament_auto_add_library(NO_TARGET_LINK_LIBRARIES)
and link them to lib_utilizationament_auto_add_library
node.cpp and link previous targets to thisCMake part takes 7.5 minutes unfortunately.
I'm guessing repeated call to ament_target_dependencies is slowing CMake process.
I am checking this on my machine too. I also observe with behavior_path_planner
that splitting the lib into multiple sub libraries greatly increase the overall build time (~from 3:30 to ~4:40 without tests). But when you think about it, why should it be otherwise? all the cpp.o files are compiled in parallel already, so splitting them into different libs does nothing but add more cmake generation and sync time, no?
However, I can see something interesting with the ninjatracing
output: behavior_path_planner_node.cpp.o
takes almost 2 minutes to compile (and a big share is optimizer passes). Most other object files are compiled in less than 30 seconds (except utilities.cpp.o which takes ~50s). The consequence is that during almost 1min30, 7 worker threads are waiting for behavior_path_planner_node.cpp.o
task to finish.
How come? the file does not look that big though.
Found the culprit: each create_subscription
line takes 15 seconds to optimize...
create_subscription
lines, behavior_path_planner_node.cpp.o
compilation takes 1min 26s.create_subscription
lines, behavior_path_planner_node.cpp.o
compilation takes 1min 5s.create_subscription
lines, behavior_path_planner_node.cpp.o
compilation takes 45s.create_subscription
lines, behavior_path_planner_node.cpp.o
compilation takes 30s.behavior_path_planner_node.cpp.o
compilation takes 15s.Basically 1 create_subscription
= +15s of compilation. That is absolutely insane...
Is it like this on all ROS2 packages?
Example with only the 7 create_subscription
lines remaining and pretty much all the rest commented out.
EDIT: After few more tests, it seems to be just a clang thing. With gcc I don't have such inflated numbers. For instance, I have tested the 2 scenarios:
A. compile behavior_path_planner_node.cpp
only
B. compile behavior_path_planner_node.cpp
only, with the create_subscription
lines commented out.
And with the same build command (I have a few extra -Wno-*
in autoware_cmake
for clang only):
$ rm -r {build,install}/behavior_path_planner; CC=clang CXX=clang++ colcon build --symlink-install --continue-on-error --event-handlers console_cohesion+ --cmake-args " -GNinja" "-DBUILD_TESTING=0" -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCUDA_NVCC_FLAGS="-allow-unsupported-compiler" --packages-select behavior_path_planner
With clang I get: A. 2min 13s B. 1min 3s
And with gcc I get: A. 1min 36 B. 1min 3s
Although I cannot see where gcc spend its time, there is definitely a big difference here. Clang seems to be the only one to get stuck in endless optimization passes.
BTW I found out repeated calls to ament_auto
macros in a CMakeLists.txt slow down cmake because these list variables are extended every time this macro is called, and it will contain duplicate elements by the numbers of its call.
The workaround is to add
list(REMOVE_DUPLICATES ${PROJECT_NAME}_FOUND_DEFINITIONS)
at the end of the macro. For example I could reduce cmake processing time in this PR.
I'll try making PRs to ament_cmake
or ament_cmake_auto
.
This pull request has been automatically marked as stale because it has not had recent activity.
@VRichardJP Thanks for your proactive work !
I also tried making a PR to ament_cmake
As described here it looks like we cannot immediately merge these PRs to the next release of ROS2. For the improvement of ament_auto
, we have the option to create a fork of ament_cmake
under AWF group, add it to autoware.repos
so we can use the improved ament_auto
when we build Autoware
.
Apart from it, I think we should put efforts into removing unnecessary/huge header files. I recently found out that only compiling behavior_velocity_planner/src/scene_module/intersection/scene_intersection.cpp
takes a bit long time. Here is what I did:
compile_commands.json
you can find a entry for the build command of behavior_velocity_planner/src/scene_module/intersection/scene_intersection.cpp
.g++ -DBOOST_ALLOW_DEPRECATED_HEADERS -DBOOST_ALL_NO_LIB -DBOOST_ATOMIC_DYN_LINK \
-DBOOST_CHRONO_DYN_LINK -DBOOST_DATE_TIME_DYN_LINK -DBOOST_FILESYSTEM_DYN_LINK \
-DBOOST_GRAPH_DYN_LINK -DBOOST_IOSTREAMS_DYN_LINK -DBOOST_LOCALE_DYN_LINK \
-DBOOST_LOG_DYN_LINK -DBOOST_LOG_SETUP_DYN_LINK -DBOOST_PRG_EXEC_MONITOR_DYN_LINK \
-DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_RANDOM_DYN_LINK -DBOOST_REGEX_DYN_LINK \
-DBOOST_SERIALIZATION_DYN_LINK -DBOOST_SYSTEM_DYN_LINK -DBOOST_THREAD_DYN_LINK \
-DBOOST_TIMER_DYN_LINK -DBOOST_UNIT_TEST_FRAMEWORK_DYN_LINK -DBOOST_WAVE_DYN_LINK \
-DBOOST_WSERIALIZATION_DYN_LINK -DDEFAULT_RMW_IMPLEMENTATION=rmw_fastrtps_cpp -DDISABLE_PCAP \
-DEIGEN_DENSEBASE_PLUGIN=\"grid_map_core/eigen_plugins/DenseBasePlugin.hpp\" \
-DEIGEN_FUNCTORS_PLUGIN=\"grid_map_core/eigen_plugins/FunctorsPlugin.hpp\" -DQT_CORE_LIB -DQT_GUI_LIB \
-DQT_NO_DEBUG -DQT_OPENGL_LIB -DQT_WIDGETS_LIB -DRCUTILS_ENABLE_FAULT_INJECTION \
-DROS_DISTRO_HUMBLE -DROS_PACKAGE_NAME=\"behavior_velocity_planner\" \
-Dbehavior_velocity_planner_EXPORTS -Dkiss_fft_scalar=double \
-I/home/autoware/src/autoware/universe/planning/behavior_velocity_planner/include \
-I/opt/ros/humble/include/rosidl_runtime_c -I/opt/ros/humble/include/rosidl_typesupport_interface \
-I/opt/ros/humble/include/shape_msgs -I/opt/ros/humble/include/fastcdr \
-I/opt/ros/humble/include/rosidl_runtime_cpp -I/opt/ros/humble/include/rosidl_typesupport_fastrtps_cpp \
-I/opt/ros/humble/include/rmw -I/opt/ros/humble/include/rosidl_typesupport_fastrtps_c \
-I/opt/ros/humble/include/rosidl_typesupport_introspection_c \
-I/opt/ros/humble/include/rosidl_typesupport_introspection_cpp -I/opt/ros/humble/include/action_msgs \
-I/opt/ros/humble/include/ament_index_cpp -I/opt/ros/humble/include/libstatistics_collector \
-I/opt/ros/humble/include/rcl -I/opt/ros/humble/include/rcl_interfaces \
-I/opt/ros/humble/include/rcl_logging_interface -I/opt/ros/humble/include/rcl_yaml_param_parser \
-I/opt/ros/humble/include/libyaml_vendor -I/opt/ros/humble/include/tracetools -I/opt/ros/humble/include/rcpputils \
-I/opt/ros/humble/include/statistics_msgs -I/opt/ros/humble/include/rosgraph_msgs \
-I/opt/ros/humble/include/rosidl_typesupport_cpp -I/opt/ros/humble/include/rosidl_typesupport_c \
-I/opt/ros/humble/include/class_loader -I/opt/ros/humble/include/composition_interfaces \
-I/opt/ros/humble/include/rclcpp_action -I/opt/ros/humble/include/rcl_action \
-I/home/autoware/install/autoware_common_msgs/include/autoware_common_msgs \
-I/opt/ros/humble/include/geographic_msgs -I/opt/ros/humble/include/rosbag2_storage -I/usr/include/vtk-9.1 \
-I/usr/include/jsoncpp -I/usr/include/freetype2 -I/usr/include/x86_64-linux-gnu/qt5 -I/usr/include/x86_64-linux-gnu/qt5/QtOpenGL \
-I/usr/include/x86_64-linux-gnu/qt5/QtWidgets -I/usr/include/x86_64-linux-gnu/qt5/QtGui \
-I/usr/include/x86_64-linux-gnu/qt5/QtCore -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -isystem /usr/include/eigen3 -isystem /home/autoware/install/tier4_v2x_msgs/include/tier4_v2x_msgs \
-isystem /home/autoware/install/tier4_api_msgs/include/tier4_api_msgs \
-isystem /home/autoware/install/tier4_planning_msgs/include/tier4_planning_msgs \
-isystem /home/autoware/install/autoware_auto_planning_msgs/include/autoware_auto_planning_msgs \
-isystem /home/autoware/install/autoware_auto_perception_msgs/include/autoware_auto_perception_msgs \
-isystem /home/autoware/install/autoware_auto_mapping_msgs/include/autoware_auto_mapping_msgs \
-isystem /home/autoware/install/autoware_adapi_v1_msgs/include/autoware_adapi_v1_msgs \
-isystem /opt/ros/humble/include/cv_bridge -isystem /opt/ros/humble/include/diagnostic_msgs \
-isystem /opt/ros/humble/include/geometry_msgs -isystem /opt/ros/humble/include/message_filters \
-isystem /opt/ros/humble/include/nav_msgs -isystem /opt/ros/humble/include/rclcpp \
-isystem /opt/ros/humble/include/rclcpp_components -isystem /opt/ros/humble/include/sensor_msgs \
-isystem /opt/ros/humble/include/tf2 -isystem /opt/ros/humble/include/tf2_eigen \
-isystem /opt/ros/humble/include/tf2_geometry_msgs -isystem /opt/ros/humble/include/tf2_ros \
-isystem /opt/ros/humble/include/visualization_msgs \
-isystem /home/autoware/install/motion_velocity_smoother/include -isystem /home/autoware/install/vehicle_info_util/include -isystem /home/autoware/install/planning_test_utils/include \
-isystem /home/autoware/install/component_interface_utils/include \
-isystem /home/autoware/install/tier4_system_msgs/include/tier4_system_msgs \
-isystem /home/autoware/install/rtc_interface/include \
-isystem /home/autoware/install/tier4_rtc_msgs/include/tier4_rtc_msgs \
-isystem /home/autoware/install/route_handler/include -isystem /home/autoware/install/motion_utils/include \
-isystem /home/autoware/install/interpolation/include -isystem /home/autoware/install/grid_map_utils/include \
-isystem /home/autoware/install/tier4_autoware_utils/include \
-isystem /home/autoware/install/tier4_debug_msgs/include/tier4_debug_msgs \
-isystem /home/autoware/install/osqp_interface/include -isystem /home/autoware/install/lanelet2_extension/include -isystem /home/autoware/install/component_interface_specs/include \
-isystem /home/autoware/install/autoware_utils/include \
-isystem /home/autoware/install/autoware_planning_msgs/include/autoware_planning_msgs \
-isystem /home/autoware/install/autoware_auto_vehicle_msgs/include/autoware_auto_vehicle_msgs \
-isystem /home/autoware/install/autoware_auto_tf2/include \
-isystem /home/autoware/install/autoware_auto_system_msgs/include/autoware_auto_system_msgs \
-isystem /home/autoware/install/autoware_auto_geometry_msgs/include/autoware_auto_geometry_msgs \
-isystem /home/autoware/install/autoware_auto_control_msgs/include/autoware_auto_control_msgs \
-isystem /home/autoware/install/autoware_auto_common/include \
-isystem /opt/ros/humble/include/builtin_interfaces -isystem /opt/ros/humble/include \
-isystem /opt/ros/humble/include/pluginlib -isystem /opt/ros/humble/include/grid_map_msgs \
-isystem /opt/ros/humble/include/nav2_msgs -isystem /opt/ros/humble/include/rcutils \
-isystem /opt/ros/humble/include/rosbag2_cpp -isystem /opt/ros/humble/include/std_msgs \
-isystem /opt/ros/humble/include/pcl_msgs -isystem /opt/ros/humble/include/unique_identifier_msgs \
-isystem /opt/ros/humble/include/tf2_msgs -isystem /usr/include/opencv4 -isystem /usr/include/pcl-1.12 \
-isystem /usr/include/ni \
-isystem /usr/include/openni2 \
-Wno-deprecated-declarations -O2 -g -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Werror -fopenmp -fPIC -std=c++17 \
-c ./scene_intersection.cpp
With this command you can only compile ./scene_intersection.cpp
. First of all I used the current source file and measured how much time it takes to compile
time ./build.sh
--> 41.65 secs
Next I removed everything except for #include
statement from this file. This file should have nothing to compile as shown below,
#include <grid_map_cv/grid_map_cv.hpp>
#include <grid_map_ros/grid_map_ros.hpp>
#include <lanelet2_extension/regulatory_elements/road_marking.hpp>
#include <lanelet2_extension/utility/message_conversion.hpp>
#include <lanelet2_extension/utility/query.hpp>
#include <lanelet2_extension/utility/utilities.hpp>
#include <cv_bridge/cv_bridge.h>
#include <magic_enum.hpp>
#include <opencv2/imgproc.hpp>
#include <scene_module/intersection/scene_intersection.hpp>
#include <scene_module/intersection/util.hpp>
#include <utilization/boost_geometry_helper.hpp>
#include <utilization/path_utilization.hpp>
#include <utilization/trajectory_utils.hpp>
#include <utilization/util.hpp>
#include <lanelet2_core/geometry/Polygon.h>
#include <lanelet2_core/primitives/BasicRegulatoryElements.h>
#include <algorithm>
#include <memory>
#include <vector>
namespace behavior_velocity_planner {}
but surprisingly, it took 18 secs! If there are only include statements for the standard library then it completed instantly.
Unfortunately it's a tricky problem to solve. The deeper the more dependencies you have, and the more likely a single header is going to pull a hell lot of things with it.
I think using forward declaration could improve this, but that would mean writing forward.h headers for many packages.
Checklist
Description
Build process of autoware.universe is getting heavier as the codebase and its functionality grows. Especially packages like
behavior_velocity_planner
andbehavior_path_planner
consume a lot of memory and time during their build process. I believe analyzing the time build (compilation + link) process takes is beneficial for developers.For the first step, I'll start by applying clang compiler's
-ftime-trace
option as well asNinja
build tool to autoware.universe to get access to detailed compile process information. They produce json files for each cmake project or source file, and we can visualize them on google chrome interactively.https://opensource.adobe.com/lagrange-docs/dev/compilation-profiling/
Purpose
Possible approaches
I think the most simple and effective way to this is to reduce the size of each translation unit as much as possible by the following approaches:
.cpp
files.hpp
files. For example we should just includelanelet2_core/Forward.hpp
instead of others in.hpp
filesmotion_utils/motion_utils.hpp
andtier4_autoware_utils/tier4_autoware_utils.hpp
The third item is also effective in mitigating the influence of header file changes in those library packages on other client packages. Since any small change of header files in
motion_utils
andtier4_autoware_utils
propagates to almost all source files of autoware.universe (imagine how many packages include them), each client packages should also include only necessary headers from them.For the fourth item, I'm guessing some developers are misunderstanding about inline functions that inlined functions are literally inlined in translation unit. But in my opinion that is fault, because the inline keyword is just a hint for the compiler and does not guarantee that to happen. If not inlined they are compiled in each translation unit to which they are propagated, assigned "W" (weak) symbol in the object file, and then unified by the linker. I rather think
constexpr
matches for most cases.I also noticed some template functions in
motion_utils
are mostly instantiated forvector<PathPointWithLaneId>
andvector<TrajectoryPoint>
in planning modules. It is a waste of time for the compiler to instantiate those functions and compile the same content each time in each source file, so I suggest the usage ofextern template
feature.If a template function is marked as
extern template
in a header file for T1, T2, etc., instantiated in a source file as explicit specifications for T1, T2, etc., and compiled to object file, then on the client side those functions are not instantiated for T1, T2, etc. and instead they are linked to the object file. This saves the time for compiling duplicate content many times (and link process).I'm not sure of this, but the usage of Boost.Geometry functions is also one of the causes.
Definition of done