colcon integration test suite

Hi Dirk,

As a followup to the colcon-bundle breakage last week we have discussed that it would be valuable to have a shared integration test suite between all the colcon packages. My idea would be to add something like the following to the travis.ci run of each colcon package:

git clone github.com/colcon/colcon-integration-tests
./colcon-integration-tests/run-tests.sh <package name>

The shell script would then execute all the tests in the package to exercise different parts of the colcon ecosystem, using the local workspace of the package under test and the master branches of all other colcon packages.

We currently have some integration tests in colcon-bundle that run by invoking run_integration_test.sh. These tests use docker to build and bundle a workspace containing the different types of packages we support. Then it runs tests on the generated bundles inside an empty container to ensure the bundle is correct.

I think having a shared integration test suite could greatly reduce the cognitive burden of making changes to core packages. Off the top of my head here's some other tests I think could be added:

colcon-core
- bootstrap
- pytest
colcon-cmake
- build CMake package
- setup.sh includes package
colcon-ros
- catkin_make workspace build
- ament workspace build
- setup.sh includes packages
Backwards compatibility tests - install some pip freeze set of versions then upgrade the package under test and validate.

What do you think of this? Would you be willing to create a shared colcon-integration-tests repository that any colcon package could use to test itself with all upstream and downstream dependencies? I'd appreciate any ideas/suggestions/feedback on how this could best be structured.

Thanks! Matt

I understand the desire to add additional tests with the goal to catch possible future regressions early. For some use cases integration tests certainly make sense, e.g. ensuring that the documented instructions actually work:

install from source / via pip / using Debian packages
build / test passing for certain workspaces (ROS 1 / 2, Gazebo) and the results are usable

On the other hand I am not convinced that these should all be run on each PR in every repository. A very important aspect of CI is that it returns results fast. E.g. the CI on the colcon-bundle repo is taking ~16 minutes which I would consider on the upper end of (if not even beyond) the desired time for CI builds.

Such integration tests could also be triggered on a regular basis (daily / weekly). E.g. most colcon repos already run their CI builds once a week (beside on per PR) to catch regressions cased by either changes in other colcon packages or by upstream dependencies. At the end of the day each maintainer needs to balance the desire for more coverage and faster turnaround time - so I think this will be different for each repository and not one-approach-fits-all.

Off the top of my head here's some other tests I think could be added:

Regarding the mentioned bullet I would expect that several of them would be perfectly testable within one specific package / repository. Unfortunately packages beside colcon-core don't have any test coverage (beside linting).

E.g. testing that CMake packages can be processed correctly and that the generated environment enables other packages to discover previously installed ones would be a great test. But I would argue that those should best be placed in colcon-cmake itself (rather than in an integration tests repository).

The same goes for the colcon-ros use cases.

In general the challenge is to detect regressions in downstream usage when one of the goals is that packages are modular and don't necessarily know how they are being used by other packages.

Backwards compatibility tests - install some pip freeze set of versions then upgrade the package under test and validate.

While this sounds like a great goal it also feels infinitely complex since the number of possible version combinations grow combinatorially. On a per package level we can only do the best to check that changes don't break API and behavior. The latter should be covered by unit tests (e.g. colcon-core has a pretty high coverage for that very reason). The former would benefit from an automated check for API compatibility - for Python I am not aware of such a checker which are pretty common in other languages (e.g. ABI checker for C/C++, API checker for Java).

I think the best step forward on this topic would be to collect a list of desired tests (whatever kind) with a little bit of context to clarify the scope and being able to roughly estimate their runtime. Based on that initial list we can try to determine where they could be placed and if they should be exposed in a way that other packages can easily invoke them too if desired.

@murphm8 What are your thoughts on this? Is there anything we should do in the context of this ticket or just close it?

Hi Dirk, apologies for not replying sooner, I will collect the list of desired tests and reply soon.

I understand the reluctance to have CI builds take a large amount of time. We've discussed a bit of an implementation and would like to run it past you to see what you think, before we move forward with more in depth work. We want to create a solution that provides comprehensive integration testing while keeping existing PR CI times short, and keeping relevant tests within their respective packages.

Our plan is to create a package in the colcon organization which runs the integration tests of any packages that opt in by containing a specific entry point such as run_integration_tests.sh. Ideally a packages integration tests install all colcon dependencies from their respective master branch.

We will set up infrastructure to trigger Travis.CI builds on the integration test framework package whenever any package in the Colcon organization has a push to its mainline. We can then provide notifications if there are failures or any problems if desired. This way integration tests live in the package that they test and we have visibilty into the health of the entire plugin ecosystem. We would then request that before pushing to PyPI you check the most recent test run to see that downstream packages are healthy.

As far as what to test it really depends on the package. I think we would like to implement the following, but I'd appreciate your input on what you think would be most valuable to test:

colcon-ros
- build ROS package with catkin
- build ROS package with ament
- build ROS package with cmake
colcon-cmake
- build a cmake packge
colcon-bundle
- bundle with python2
- bundle with python3
colcon-ros-bundle
- bundle with ros.cmake
- bundle with ros.catkin
- bundle with rdmanifest

Our plan is to create a package in the colcon organization which runs the integration tests of any packages that opt in by containing a specific entry point such as run_integration_tests.sh.

Do I understand this correctly that each repository which wants to run additional / external tests would trigger a script in its Travis CI / AppVeyor file (without waiting for its result) which essentially triggers a CI build on a to-be-created repository?

We will set up infrastructure to trigger Travis.CI builds on the integration test framework package whenever any package in the Colcon organization has a push to its mainline.

Can you elaborate how that infrastructure would look like?

This way integration tests live in the package that they test and we have visibilty into the health of the entire plugin ecosystem.

I was under the impression that the integration tests would live in the to-be-create new repository? I guess I misunderstood this part. Can you please clarify the location of new tests.

We would then request that before pushing to PyPI you check the most recent test run to see that downstream packages are healthy.

:+1:

As far as what to test it really depends on the package. I think we would like to implement the following, but I'd appreciate your input on what you think would be most valuable to test:

Regarding the enumerated tests am I right that they would live in the repository with the name of the top level bullet point? E.g. for the item "build ROS package with catkin" I am wondering if that could not just be a "normal" test in the colcon-ros repository which gets executed together with the existing tests. It shouldn't take much time (less than a minute?). The same I would expect for the other items under colcon-ros and colcon-cmake - I can't speak for the bundle repos.

So maybe taking a step back would there be any new tests be living in the to-be-created repository? My impression was that it would be the home for new long running integration tests. Which test(s) would fall into that category (e.g. building all of ROS 1 up to ros_tutorials)?

I'm going to use a name for the package to be so that it's easier to write about. I'll call it integration-tests-runner

Do I understand this correctly that each repository which wants to run additional / external tests would trigger a script in its Travis CI / AppVeyor file (without waiting for its result) which essentially triggers a CI build on a to-be-created repository?

The integration-tests-runner would be invoked via GitHub hooks from the colcon repositories. Whenever a colcon package's mainline is updated then the integration-tests-runner would execute. It's possible other packages could also kickoff a run of integration-tests-runner.

Can you elaborate how that infrastructure would look like?

A brief dive into this lead us to something like API Gateway backed by a Lambda. The GitHub hook would call into the API Gateway which would then execute the Lambda. The Lambda would kick off a Travis run of integration-tests-runner if the git hook corresponds to an event we should kick off a new run for. I'm not too familiar with what existing infrastructure you have, we'd be happy to work with you on this.

I was under the impression that the integration tests would live in the to-be-create new repository? I guess I misunderstood this part. Can you please clarify the location of new tests.

My first posting leaned towards have a repository containing tests, but after your reply we did some more thinking and agree that it is better organization to have tests located in packages they test.

Regarding the enumerated tests am I right that they would live in the repository with the name of the top level bullet point?

Yep! My second posting is a deviation from the first in that integration tests for a given colcon package will live in that package's repository. If the integration tests have a short run time they absolutely could run as part of the Travis.CI build. If we find that the run time of some of the integration tests is too long for PR builds we can separate out their invocation and only run them via the integration-tests-runner.

So maybe taking a step back would there be any new tests be living in the to-be-created repository?

As of now the integration-tests-runner package would not contain any tests, that's not to say it couldn't contain tests, but ideally integration tests would live with the package that they are testing. Using "building all of ROS 1 up to ros_tutorials" as an example, that could live in colcon-ros but only be invoked by integration-tests-runner and not on normal Travis.CI builds of colcon-ros.

Our view is that integration-tests-runner would contain the logic to invoke tests in all the other packages in the colcon ecosystem and have the ability to report ecosystem health. That test invocation logic could be as simple as executing a specific shell script in the root of each package (eg. run_integration_tests.sh).

colcon / colcon-core

colcon integration test suite #160