colcon / colcon-core

Command line tool to build sets of software packages
http://colcon.readthedocs.io
Apache License 2.0
103 stars 46 forks source link

Overriding package error for packages in current workspace #465

Closed janjachnik closed 2 years ago

janjachnik commented 2 years ago

I've just experienced a slightly annoying side effect of this recently merged PR: https://github.com/colcon/colcon-core/pull/449

Consider the following workflow:

source /opt/ros/noetic/setup.bash
cd my_workspace
colcon build --merge-install
source install/setup.bash
colcon build --merge-install

I use this workflow often because I build, run something, make some changes, then want to build again. However, the second colcon build fails because the workspace is now seen as overlaying itself and it prints the package-overridden error message for every package in the local workspace - it thinks they are overriding themselves.

Is my workflow bad (i.e. running colcon build after sourcing the local workspace) or is this a bug?

christophebedard commented 2 years ago

I think I'm getting the same error: https://gitlab.com/ros-tracing/ros2_tracing/-/jobs/1916548789. It's a bit unfortunate that it's an error and not a warning, since it completely breaks our CI, although the error message does contain a solution (--allow-overriding). Our setup is similar, but we build two different workspaces (not the same workspace twice):

The first workspace is built daily, then the second one is built on-demand. This speeds up CI.

Is my workflow bad (i.e. running colcon build after sourcing the local workspace) or is this a bug?

I'm also wondering. What is the right (not bad or risky) way to do this?

christophebedard commented 2 years ago

466 changed the error to a "simple" warning.

ijnek commented 2 years ago

+1 to this issue, the warning shows up when building without --merge-install, too.

Is my workflow bad (i.e. running colcon build after sourcing the local workspace) or is this a bug?

This seems like standard work flow, I've never heard of having to open a fresh terminal every time you're building a package.

christophebedard commented 2 years ago

Is my workflow bad (i.e. running colcon build after sourcing the local workspace) or is this a bug?

This seems like standard work flow, I've never heard of having to open a fresh terminal every time you're building a package.

The recommendation is usually to build in one terminal, and run tests (without sourcing) or source & execute in another terminal. No need to open a new terminal to build and rebuild, of course, but that's assuming you don't source your workspace after building/before re-building.

ijnek commented 2 years ago

I didn't know about having to build and source/run in separate terminals. After a bit of searching, I did find it is covered in one of the very first tutorials of ROS2.

For reference, quoting ROS2 tutorials - Creating a workspace,

Before sourcing the overlay, it is very important that you open a new terminal, separate from the one where you built the workspace. Sourcing an overlay in the same terminal where you built, or likewise building where an overlay is sourced, may create complex issues.

ghost commented 2 years ago

+1 to this issue. I have the same workflow as @janjachnik and have never experienced the "complex issues" mentioned in the tutorial. I personally source both the overlay's and underlay's setup.bash in bashrc. I know it's redundant but didn't think it was necessarily bad or could cause issues, until this warning.

Are we expected to not source setup.bash in bashrc to avoid this warning? This would be inconvenient for quickly popping up a terminal and running a node/ros tool

destogl commented 2 years ago

I don't have the same workflow as mentioned above, but still using overlayed workspaces a lot. I don't know for you guys, but I find overlayed of workspaces a very useful feature from ROS where one was able to actually sync work of 10+ people a in a robot lab.

TBH I think that adding this warning is very annoying and moving it to error will break complete workflow of my current company and my former lab... I can imagine that workflows are confusing for the new users, but they are usual way of working for more experienced users. To avoid issues withe new users I developed ros_team_ws tooling that manages everything without many thinking. (We will release this soon, but have to polish some workflows and documentation).

What is seems to be especially annoying it that one has to name each package that is override... Did you every try to run this in an actual workspace for a robot (usually 20+ packages). This makes comping with "--up-to" and similar flags almost impossible...

I would strongly suggest that you rethink this functionality, or provide a clear workflow for people running complex workspaces for robots.

gavanderhoorn commented 2 years ago

The problem which this tries to address is the fact the current implementation of overlaying is not complete (or: is incorrect), and has the potential to break workspaces in very subtle ways. See https://github.com/ros2/ros2/issues/1150#issuecomment-859952043 for a 'nice' example, and the subsequent comments for some insight into how complex this is.

We've seen quite a few questions on ROS Answers as well along the lines of "why is Catkin/Colcon linking / including package X (or: header from package X) while it should be package Y?", which then turns out to be due to (re)ordering of dependencies from/in overlays and/or underlays.

MoveIt has also struggled with this (FCL was hard to get right IIRC).

Yes, the warning/error is annoying, but it's at least honest and goes towards least-surprise for users.

I would strongly suggest that you rethink this functionality,

which functionality specifically are you referring to? The warning/error, or overlaying in general?

or provide a clear workflow for people running complex workspaces for robots.

Getting overlaying to work correctly -- especially with modern CMake in the mix -- is non-trivial, and will likely require someone spending the time to fix how Colcon extracts the information ROS packages need to link/include the correct libraries/headers.

I don't envy the maintainers of Colcon.


Edit: fully relying on CMake may be a way to address this, but as https://github.com/ros2/ros2/issues/1150#issuecomment-860276778 for instance shows, it's not easy in any case. I believe https://github.com/ros2/ros2/issues/1150#issuecomment-930541886 provides a good summary. Option 4 is the error/warning discussed in the issue here.

destogl commented 2 years ago

@gavanderhoorn thanks for more in-depth clarification.

I see there is error when trying to build workspace with --merge-install. I am actually never using this, and I didn't felt that I ever needed it.

Would it be an option to have this warning/error only if using with --merge-install? For me, this would solve the issue.

which functionality specifically are you referring to? The warning/error, or overlaying in general?

I love overlaying and depending hardly to manage multiple workspaces working on the overlapping packages needing them on different features stages. In my days on KIT I actually crated an approach to manage the whole lab (5+ scientist and 20+) working in parallel and always having an underlying workspace working and running demos.

The error actually make me headaches because it would really make almost impossible to compile workspace or just a set of dependent packages during development. If each time, one has to explicitly state each overlayed package name, it is just a huge waste of time.

janjachnik commented 2 years ago

@destogl From my understanding of the original issue (https://github.com/ros2/ros2/issues/1150), it is when the underlay has used the --merge-install option that there is a problem, not the overlay, which in some cases is out of the users control. For example, if a ROS package installed through apt on Ubuntu has been compiled with --merge-install.

I was getting the error from the apt installed version of OMPL, something I am not in control of (although, I was using --merge-install in my overlay as well, something I am now dropping because of this)

destogl commented 2 years ago

I was getting the error from the apt installed version of OMPL, something I am not in control of (although, I was using --merge-install in my overlay as well, something I am now dropping because of this)

My point is that I am not using --merge-install but I still get this warning. Am I getting something wrong?

sloretz commented 2 years ago

I'll start with making the override warning ignore the workspace being built when looking at packages from underlay's makes sense to me. That seems like it would resolve this particular issue. It seems like there's more being discussed than that here.

There's a video explaining problems with overriding workspaces here that may be helpful for context.

I use this workflow often because I build, run something, make some changes, then want to build again. However, the second colcon build fails because the workspace is now seen as overlaying itself and it prints the package-overridden error message for every package in the local workspace - it thinks they are overriding themselves.

@janjachnik I think the warning isn't needed for your specific case, but for more context the build-source-build in the same terminal is discouraged because it breaks isolation. Packages that don't declare their dependencies can still use them, and I think it's possible to add circular dependencies after the first build. Since you're using a merged workspace, this is already a problem, so sourcing and building again doesn't add more downsides than --merge-install already did. I can see why this is a valid use case, but using isolated workspaces can save you a lot of headache if you intend to release your packages.

From my understanding of the original issue (ros2/ros2#1150), it is when the underlay has used the --merge-install option that there is a problem, not the overlay, which in some cases is out of the users control

@janjachnik ros2/ros2#1150 is only about one issue that can happen when overriding packages. Any type of workspace can experience undefined behavior (failing to build downstream packages or undefined behavior at runtime) when an overridden package changes ABI or API.

Would it be an option to have this warning/error only if using with --merge-install? For me, this would solve the issue.

@destogl The warning applies to any kind of workspace too because it's very easy to get undefined runtime behavior if the overriding package changes API or breaks ABI.

we build a workspace (ROS 2 from source) we source that workspace we build a sub-set of the packages in another workspace

@christophebedard If the ROS-2-from-source workspace is a merged workspace, then there are of course potential issues with the search order for include directories. Besides that, do any packages in the underlay depend on the ones being overridden, but are themselves not rebuilt? If so, API or ABI changes in the re-built subset can lead to undefined behavior.

I would strongly suggest that you rethink this functionality, or provide a clear workflow for people running complex workspaces for robots.

@destogl, I hear your frustration. I don't yet understand your workflow, could you tell me more about it? I'm not sure if you're talking about Overlaying workspaces, or overriding packages. Overlaying workspaces is fine, it's overriding packages (same package built in two or more workspaces) that's the problem. Removing the overridden the package from the underlay workspace is always an alternative, and usually it's just one or two commands. It could be apt-get remove ros-distro-package-being-overridden, or deleting the install/package-being-overridden folder from an isolated underlay, or removing and rebuilding a merged underlay workspace.

Getting overlaying to work correctly -- especially with modern CMake in the mix -- is non-trivial, and will likely require someone spending the time to fix how Colcon extracts the information ROS packages need to link/include the correct libraries/headers.

Solving the include directory search order problem alone is non-trivial. If the overriden package changes API or breaks ABI and another package in the underlay depends on it then there's no good way to override it safely, even in ROS 1. The only option is to build all packages above the one you want to override from source.

gavanderhoorn commented 2 years ago

@sloretz wrote:

Solving the include directory search order problem alone is non-trivial. If the overriden package changes API or breaks ABI and another package in the underlay depends on it then there's no good way to override it safely, even in ROS 1.

I agree, and I'd already understood that.

This is one of the things not clear to (new) users though.

It's essentially "just" an ABI/API compatibility problem.

The only option is to build all packages above the one you want to override from source.

which is exactly what we've been telling users to do over at ROS Answers for the past 8 years.

Tbh I don't feel it's such a stretch to require users to rebuild workspaces if such things happen.

But, at least some of the use-cases here do not seem to have this particular problem. If I understand @destogl correctly, he's using overlays mostly (?) to share packages-which-have-to-be-built-from-source (as they haven't been released fi) with multiple developers (each with their own workspace on the same machine). That's different from using overlays to update one specific package.

Removing the overridden the package from the underlay workspace is always an alternative, and usually it's just one or two commands. It could be apt-get remove ros-distro-package-being-overridden [..]

I believe this already came up in ros2/ros2#1150, but that's really only an option when overlaying a leaf package. Anything else will immediately lead to a cascade of uninstall actions by apt / insert-pkg-manager-here.

destogl commented 2 years ago

@destogl, I hear your frustration. I don't yet understand your workflow, could you tell me more about it? I'm not sure if you're talking about Overlaying workspaces, or overriding packages. Overlaying workspaces is fine, it's overriding packages (same package built in two or more workspaces) that's the problem. Removing the overridden the package from the underlay workspace is always an alternative, and usually it's just one or two commands. It could be apt-get remove ros-distro-package-being-overridden, or deleting the install/package-being-overridden folder from an isolated underlay, or removing and rebuilding a merged underlay workspace.

I am actually using both, overlaying and overriding. And removing package is not really an option because it breaks than the setup for all other people.

Imaging this set of overlayed workspaces with any possible combination of overridden packages:

  1. System packages
  2. Lab-wide workspace: shared packages that are stable and working (some released, some not)
  3. Research group workspace: packages that are stable for specific scenarios (can override underlying packages)
  4. Personal workspace: currently actively develop packages (can also be overridden)

I used this setup with ROS1 very happily for 3 years, with very often switches and API/ABI braking (imaging 3-5 students working in parallel on very similar projects/features and often same packages).

sloretz commented 2 years ago

From #469

For me, happens even when not merge-installing. Simply replicated by creating a new package and building more than once

@mnissov It looks like the terminal colcon build is run from has already sourced that workspace. You may be interested in this comment. https://github.com/colcon/colcon-core/issues/465#issuecomment-1001607770

This is the exact behavior we're currently using in the Hardware Acceleration Working Group (HAWG) for cross-compilation. The use case for us is clear, we're cross-compiling packages that exist in underlays to other (compute) architectures.

@vmayoral These instructions? I followed these instructions and I didn't see the warning. Where does /opt/ros/.../ get sourced?

INFO:Docker Client:+ colcon build --mixin aarch64-docker --build-base build_aarch64 --install-base install_aarch64
INFO:Docker Client:Topological order
INFO:Docker Client:- demo_nodes_cpp (ros.ament_cmake)
INFO:Docker Client:Starting >>> demo_nodes_cpp

From this thread

Imaging this set of overlayed workspaces with any possible combination of overridden packages: [...] I used this setup with ROS1 very happily for 3 years, with very often switches and API/ABI braking (imaging 3-5 students working in parallel on very similar projects/features and often same packages).

@destogl Any possible combination of overridden packages? Say their are two packages in the lab-wide workspace: Alice and Bob, and Bob depends on Alice. Now say the research group workspace overrides Alice and their version breaks Alice's ABI. If the research group workspace or the personal workspace use Bob, then they will face undefined behavior because Bob interacts with overridden Alice as if it still had the old ABI. Maybe software crashes, maybe it's subtle changes to a value in RAM that causes an algorithm to not perform as well or to perform better than expected. How does the group avoid that?

hoffmann-stefan commented 2 years ago

Also getting this warning since the last update, was using a seperate terminal for building a workspace since I use ROS2.

There is another use case for which you need to override packages from an underlay: If you use a client library for another language like ros2_dotnet you need to rebuild the interface packages to gernerate the source code for the .msg files in the used language, see this paragraph from the ros2_dotnet README.

For running on Linux or Windows Desktop, one can build ros2_dotnet (along with all desired packages containing interface definitions) as an overlay on top of an existing ROS2 installation. The ros2_dotnet.repos contains all necessary repositories to build the core ros2_dotnet project along with all standard ROS2 interface packages. If you are using other packages which provide interface definitions, those must also be included in the ros2_dotnet workspace in order for .NET bindings to be generated. (NOTE: if you wish to build the core of ROS2 from source, everything through the rcl layer is required.)

I think in this case this is more of a necessary hack, but for now there is no other way to generate the source code for other programming languages to access the messages.

vmayoral commented 2 years ago

@vmayoral These instructions? I followed these instructions and I didn't see the warning. Where does /opt/ros/.../ get sourced?

INFO:Docker Client:+ colcon build --mixin aarch64-docker --build-base build_aarch64 --install-base install_aarch64 INFO:Docker Client:Topological order INFO:Docker Client:- demo_nodes_cpp (ros.ament_cmake) INFO:Docker Client:Starting >>> demo_nodes_cpp

@sloretz that tool relies on Docker for cross-compilation and as indicated, relies on emulated builds. That's a valid approach for certain use cases but slows down significantly the production of complex artifacts for hardware acceleration (i.e. you're not calling directly the cross-compiler but using multi-level virtualization (an emulation within a simulation)). Overall, the project you hint, though fantastic, does not generalize on many cross-compilation use cases (this is what's raised at https://github.com/colcon/colcon-core/issues/469). It's just one of the approaches.

The more common approach for cross-compilation is to simply extend the CMake project by setting the appropriate environment. This can be done manually in the CMakeLists.txt or for the whole workspace (e.g. by using mixins). To give you an example, here's my mixins setup:

colcon mixin show
build:
- kv260
  cmake-args: ['-DCMAKE_SYSTEM_NAME=Linux', '-DCMAKE_SYSTEM_VERSION=1', '-DCMAKE_SYSTEM_PROCESSOR=aarch64', '-DCMAKE_C_COMPILER=/media/xilinx/hd3/tools/Xilinx/Vitis/2021.2/gnu/aarch64/lin/aarch64-linux/bin/aarch64-linux-gnu-gcc', '-DCMAKE_CXX_COMPILER=/media/xilinx/hd3/tools/Xilinx/Vitis/2021.2/gnu/aarch64/lin/aarch64-linux/bin/aarch64-linux-gnu-g++', '-DCMAKE_SYSROOT=/home/xilinx/ros2_ws/install/../acceleration/firmware/kv260/sysroots/cortexa72-cortexa53-xilinx-linux', '-DCMAKE_FIND_ROOT_PATH=/home/xilinx/ros2_ws/install-kv260', '-DCMAKE_INSTALL_RPATH=/home/xilinx/ros2_ws/install/../acceleration/firmware/kv260/sysroots/cortexa72-cortexa53-xilinx-linux/usr/lib', '-DCMAKE_INSTALL_RPATH_USE_LINK_PATH=TRUE', '-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER', '-DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ONLY', '-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY', '-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=ONLY', '-DPYTHON_SOABI=cpython-37m-aarch64-linux-gnu', '-DTHREADS_PTHREAD_ARG=0', '--no-warn-unused-cli']
...

The Hardware Acceleration Working Group employs precisely this approach for cross-compilation and it does so by extending both ament and colcon to automate the setup of overlays for cross-compilation (see https://github.com/ros-acceleration). Finding a way to address this ticket's issue and the one at https://github.com/colcon/colcon-core/issues/469 would be great.

(Note I'm open to propose an implementation for addressing https://github.com/colcon/colcon-core/issues/469, so it'd be great to get feedback on the proposed method)

sloretz commented 2 years ago

473 Prevents the override warning from triggering when a workspace is built, its install space sourced, and then built again in the same terminal. I think that resolves that issue.

As explained above, sourcing a workspace and then building it breaks the isolation between packages, so it still a practice to be avoided (Edit: when using isolated workspaces).

jacobperron commented 2 years ago

There is another use case for which you need to override packages from an underlay: If you use a client library for another language like ros2_dotnet you need to rebuild the interface packages to gernerate the source code for the .msg files in the used language, see this paragraph from the ros2_dotnet README.

Indeed. This affects ROS 2 Java as well, where we need to overlay interface packages (or build ROS 2 from source). Unless we're modifying the interface packages in the overlay, I suppose it's safe to ignore/suppress the warning. A bit annoying, but I'm not sure what else we can do given the situation.