conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.14k stars 970 forks source link

[question] How to integrate build-adjacent concerns into CI workflow using conan? #8364

Open PengolodhGoedath opened 3 years ago

PengolodhGoedath commented 3 years ago

I do hope this is the right place for this kind of question.

I am currently designing a CI workflow for a collection of projects using conan as their package manager. But aside from just building the binaries, the CI workflow should also create other artifacts (depending on circumstances any or all of: static analysis reports, unit test results, documentation, ...) and also deploy them to several different targets.

I'm currently stuck on the most maintainable approach to reach all these goals. My biggest concern isn't about achieving each of the parts individually, but how to compose them into a maintainable fashion (ideally allowing for enabling different artifacts at will) and allowing the reuse of pipelines for different recipes.

I've come up with several approaches, with distinct pro's and con's, but I'm not quite happy with any of them.

Approach # 1

Use a different conanfile for each artifact.

Pro's:

Con's:

Approach # 2

Use options or settings to enable/disable features on request. No removal of options from package_id.

Pro's:

Con's:

Can work, but would require tight discipline among different users. Might be viable for packages with only infrequent changes. Another option to make it work would be the ability to match any value for a given option ("I don't care whether the option enable_static_analysis was set, I just want the binaries").

Approach # 3

Use options or settings to enable/disable features on request. Options get removed from package_id.

Pro's: (as approach # 2)

Con's:

Can work, but would require strict controls on which packages can be uploaded to remote. Impact of bloated packages could maybe be reduced if conan were able to only download individual components instead of whole packages.

Approach # 4

Using custom generators for alternative artifacts. Results effectively are the same as approach # 3, as used generators aren't used in package_id calculation (otherwise, the results would be similar to approach # 2 in case of generator combinations. Might be feasible if only using one specific generator for each package_id, e.g. generating a "documentation-package", "test-report-package" etc. per recipe).

Might be of interest if generators could be instructed with a specific location where to put the artifacts outside of the build_folder (or was it install_folder?).

Approach # 5

Instrumenting build via environment variables. Artifacts besides binaries/headers are not packaged at all.

Pro's:

Con's:

It works, unless you need reproducible builds, then it can be hard to verify that those environment variables don't subtly change things around. CI can be somewhat streamlined unless some recipes need special attention (regarding instrumentation or later deployment of artifacts).

Approach # 6

conanfile gets instructed (via option or environment variable) to directly deploy other artifacts. Said other artifacts are not included in resulting conan package.

Pro's:

Con's:

Just no.

Question

Have I missed any substantial different approaches? What are the recommended approaches for integrating conan into a CI workflow? conan seems to excel at building and distributing binaries, but there doesn't seem to be a clear path regarding how to treat other build-adjacent artifacts.

ytimenkov commented 3 years ago

What I'm converged to in my setup is that build stage/step should be separate from the others.

First, you build binaries. Make sure that conanfile.py is self-sufficient and your binaries can be built with --build missing. This helps, especially if you want to move to new compiler / OS. No side effects at this stage.

You should also grab all artifacts which are produced anyways but will help you in further stages. For example, compile_commands.json if your static analyzer supports it, coverage notes (if you plan to run tests and measure coverage) (yes, for coverage I have an option since it affects what's produced, see also https://docs.conan.io/en/latest/howtos/sanitizers.html on similar subject).

Rest of work you do in separate stages. For example to run the tests you deploy freshly built package (https://docs.conan.io/en/latest/devtools/running_packages.html) and then just run binaries. Doing so allows publishing any report in any form you need (jUnit, code coverage) and as a bonus you could rerun tests in case of transient issues.

The problem is that your packages will contain bit more stuff than needed for consumers. However this may not be a problem (so there is some stuff lying around unused...).

If this is a problem you may try to repackage this in a later step (https://docs.conan.io/en/latest/devtools/running_packages.html#runtime-packages-and-re-packaging) or (what I'm experimenting with now) is to build 2 binary packages: one with test code and one without it. See https://docs.conan.io/en/latest/reference/conanfile/methods.html#build-id . The idea is to add a testing option and tune build_id so that all variants use same build directory and Conan will skip build() method if it finds existing build directory:

    def build_id(self):
        # Set to something different from True and False so both variants will reuse build directory
        self.info_build.options.testing = "NoneOfAbove"

Hope this give some insights.

jgsogo commented 3 years ago

Hi! This is related to https://github.com/conan-io/conan/issues/4694, so far we haven't decided what would be the first alternative (workaround using current features or a new feature).

I would like to add something else to these alternatives. There is a Conan feature, compatible_packages you can use to fallback to other package-IDs when you are consuming a package. Probably, if you are requesting a package without extra-artifacts (you just want headers and libraries) you can fallback to a package that contains also extra artifacts:

from conans import ConanFile

class Recipe(ConanFile):
    options = {'build_docs': [True, False]}

    def build(self):
        ...
        if self.options.build_docs:
            self.run('make docs')

    def package(self):
        self.copy(...)
        if self.options.build_docs:
            self.copy('generated_docs', dst='docs')

    def package_id(self):
        if not self.options.build_docs:
            # If you request a package without docs, you can use a package with docs if it is available
            compatible_pkg = self.info.clone()
            compatible_pkg.options.build_docs=True
            self.compatible_packages.append(compatible_pkg)

It can help with some of your scenarios where you want to remove options in order not to have so much variability of package-id. With _compatiblepackages you will have different packages (no Pandora's (or Heissemberg's) packages), but you don't need to build all the packages if you don't want to, you can use:

conan install package/version@ -o package:build_docs=False

and if this package is not available, thanks to the fallback, Conan will retrieve and use the package that was generated with build_docs=True.


Hope it can help even though it is not an answer. This is one of the issues (thanks for describing it with so much detail) we would like to revisit when implementing Conan v2.0 (not only documentation or performance reports, but it's been also requested for PDB files).

PengolodhGoedath commented 3 years ago

@ytimenkov Just so I'm understand you correctly: You want two recipes per package, one building it and packaging all build artifacts and binaries, and one taking the first package and generating all other artifacts (as required)? Possibly a third package, to repackage the results of the first?

Seems like an interesting approach. It doesn't suffer from the synchronization problem (as the package version gets captured), though the second package still needs intrinsic knowledge about the build process of the first package, so it likely needs to be specialized almost every time :/

Still, interesting approach, especially if the second stage recipe could be made generic to accept any other package...


@jgsogo Using compatible_packages seems tempting, but it's also vulnerable to combinatorial explosion. One option is nice and simple, but multiple options less so. (Related question: Are compatible_packages transitive? Otherwise I'd need to define 19 compatible_packages for just three options, and still 7 if they are indeed transitive...). Still, it might make approach # 2 more feasible.


Nice to hear you're still looking for feedback on this issue.

My thoughts on the issue: It would be really nice if components were packaged separately (and could be separately deployed/required). That would seem to solve the problem on my end (just package each different category of artifacts in a different component), and it would be a nice boon for other partial consumers of multi-component packages (only using some of the components).

Alternatively, it could be interesting to have generic recipes, akin to test_packages, that could accept any (*) package and generate some artifacts on their own, packaged separately. But that would be hard to fit into the current name/version@user/channel model...

ytimenkov commented 3 years ago

You want two recipes per package, one building it and packaging all build artifacts and binaries, and one taking the first package and generating all other artifacts (as required)?

This is really just an optimization. My major point was that Conan package should contain binaries. Other artifacts relevant for CI (like test / coverage reports) are not really part of the Conan package and should be produced separately. (well, PDBs and coverage notes for example are packaged, since it's what produced by compiler).

A binary from package can be consumed for multiple purposes. For example we have smoke / functional tests and performance tests running in parallel producing different kinds of reports, but consuming the same Conan package and running the same executable.

To emphasize it once more: testing should not be a part of a recipe or a Conan package. If you want to test - deploy package and carry on whatever testing needed. At least that's how our CI works.

PengolodhGoedath commented 3 years ago

Maybe this gets clearer if I specify what my expectations on a general CI workflow are.

Generally, I can see three main requirements:

  1. In case of a pull request, functional tests and static analysis should be run, so I can see what breaks and/or what could be improved before merging. The artifacts of those only have to survive until the PR gets merged, afterwards they aren't strictly needed anymore (plus I could reconstruct them via commit ID and/or lockfile).
  2. Upon merge into our major development branch, unit tests should be rerun in addition to more exhaustive integration tests and the result be recorded for statistics/regression analysis. Static analysis should be run, but mostly so I have a new baseline to compare PRs to. Documentation should be generated and deployed onto internal servers for consumption by developers.
  3. For a release, an installer of the application is created and the documentation retrieved. Those artifacts need to be deployed for further QA testing before finally getting released to the public. At this stage, unit testing (and static analysis) become much less of a concern, so likely won't be needed (after all, I'm only creating an installer for an already tested binary).

(This is a simplified description, but the part that is mostly concerning me right now.)

Ideally, every package deployed onto the internal artifactory repository should be ready for consumption and - more importantly - not break any other package depending on it.

I'm not 100% sure whether Conan is the correct place for doing all of this work, but there aren't many other places that could. There are only up to three layers that actually have the potential access to all the required information to actually build, test and analyse the package: CI, Conan and the used build system.

On the bottom, there's the build system. It might actually be able to produce some artifacts (e.g. test reports, some cases of static analysis reports, maybe even test coverage). Still, sometimes the tools used just operate above the build system level (e.g. scan-build from the clang static analyzer or the facebook infer tool), so the build system cannot solely be responsible to generate all information I require. Also, it lacks access to the package id to correctly attribute the reports (though commit ID could make do if really necessary).

On the top, there is the CI itself. Of course, it can access anything the lower layers can, as it controls the whole process. Still, that doesn't mean that it is the best place to explicitly generate all the artifacts needed. For one, it would need more specific information on each package than I feel such a generic process should have, starting with the build folder location itself (use conan create or build manually?), the build system and language used, and possibly other problems (incompatibilities with certain tools, package restrictions on os/arch, ...). While this could be done, I'd at some point reimplement most of the capabilities Conan already has. Also, while the binaries could be reliably reproduced using conan, changes in the CI environment might be harder to roll back.

Ideally, I think the CI should only deal in packages (and how to deploy their contents) and their meta information (lockfiles, deployment targets, ...).

This leaves me with conan in the middle. Contrary to the build system, it has all required information (or can at least easily be told). Contrary to CI, it is its business to know the details of the package, what build system and files are used, and thus which tools can be used to generate the reports we require. There are only two or three relatively minor issues:

For the latter problem, I had hoped other people would have run into this problem before and found a better solution that wouldn't instruct the conanfile build() or package() methods specifically for the CI environment, or require packaging the whole build_folder to be consumed by multiple other packages.


That's how I view this situation. Of course, some of my ideas might be awfully idealistic, but that's what they are. I am aware that in practice there will likely be some comprise required.

Your perspective might of course disagree with any or all of the above.

ytimenkov commented 3 years ago

Your workflow is not unique.

Just compare with how you would have created a multi-job CI pipeline without Conan:

  1. Build job.
    1. CI checks out sources,
    2. runs configure/make (usually called a "script step")
    3. then you tell CI to grab certain paths and publish them as artifacts.
  2. Test job.
    1. Tell CI to grab artifacts from build job.
    2. Run script step which executes the tests and produces a report.
    3. Then CI takes this report, processes it (e.g. coverage greater than X%).

With Conan only changes you make is:

  1. Instead of calling configure/make you call conan create
  2. Instead of publishing artifacts you call conan upload
  3. Instead of telling CI to download artifacts from previous job you call conan install <built ref>.

Then it depends on your particular CI how to produce a reference and pass it between jobs / stages.

And as you pointed out it is better to go with Conan as far as you can and extract artifacts only at the latest possible stage. Having Conna reference you can extend this workflow to multi-component pipelines which do real integration rather than just builds. Conan team provided a really good example and a training on the subject.