[question] Best practices for handling test data in a CI pipeline

AndreasAckermannTSystems commented 4 days ago

What is your question?

Hi everyone,

I'm working on establishing Conan packages and a CI pipeline for a product line of applications built from a shared set of modules, each contained in its own repository. These modules (e.g. modA and modB) all access a common database via classes contained in a module called orm.

The orm repository also contains database dumps for a test-database, an import script, and a database configuration file used by our applications unit tests. modA and modB's tests expect an initialized test database, and to have been provided this configuration file in a well-known location relative to their test binaries.

The CI pipeline runs in ephemeral Docker containers for each repository, and as such, the test database needs to be recreated by importing the dumps on each run.

My current intended approach is the following:

In orm, package the test database dumps, config file and an importer script into the orm package
- Potential issue: Wasted space due to increased package sizes, as the test database changes rarely
In modA, initialize the test database during the conanfile.py's build method, if a CI=1 environment variable is detected, by copying out the config file from orm and executing the data import script contained there as well

Are there best practices / better ways to handle test data with Conan in a CI pipeline setting?

Have you read the CONTRIBUTING guide?

[X] I've read the CONTRIBUTING guide

memsharded commented 4 days ago

Hi @AndreasAckermannTSystems

Thanks for your question

As a general guideline for CI at scale, the ongoing work in https://github.com/conan-io/docs/pull/3799 might be useful, hopefully it can be published soon, but you might be able to generate the docs locally. This is not really about your questions, but it might be useful for the general issues of defining a CI pipeline.

Regarding your question, indeed you could put more artifacts inside the orm package, but as you pointed out, the size of the dump and the other files might be relevant, specially if you use the orm library artifacts very often without those test artifacts.

If the balance points that this could be a real problem, then there could be some alternatives to consider, like storing the test artifacts in a separate package that can be used as test_requires, or maybe using the "package metadata files" feature. But I think I'd probably start by putting things in the orm package and learn from there (unless you tell me the DB test dump would be like GBs in size)

if a CI=1 environment variable is detected

In general, it is better for Conan to model things more explicitly, like using Conan conf mechanism, the idea is that things can be easily reproduce locally, and tests executed by developers in their machines just by conan install ... -c user.myorg:build_tests=True or something like that. And also, otherwise, you can easily run fast jobs in CI that don't run those heavy tests, but might run other tersts. Note there are also some built-in confs like tools.build:skip_test that could be used in recipes already.

So having a bit more info about the test artifacts sizes and patterns/frequency of usage, could help deciding in one direction or another.

memsharded commented 3 days ago

Another important aspect to take into account would be the time of building the orm thing. If it is fast enough, then it wouldn't be a concern to just re-build things to create a separate orm_data package, separate from the orm one containing the actual libraries. That orm_data package could be used for example as test_requires.

conan-io / conan

[question] Best practices for handling test data in a CI pipeline #17133

What is your question?

Have you read the CONTRIBUTING guide?