Split package into chunks to avoid 6h timout

h-vetinari commented 1 year ago

Most recently encountered #140, where some features that needed protobuf 4 got (re-)enabled, pushing the compilation time from just below 6h to being killed by azure. Moving the discussion there into an issue:

@h-vetinari: We're getting into problematic spheres here unfortunately. Windows on protobuf 3.21 (with the disablement patch) already took 5:48h, which is pushing against the 6h hard cutoff enforced by azure. Without that feature disablement, we go from building 6917 build steps to 7027, which is not such a big change, but seemingly enough to push it over the edge (also not all agents are equally spec'd, so there's some randomness involved).

Still, the OSX builds for protobuf 4 are not far behind (way over 5h), so we're probably going to have to start thinking about how to slice up this package somehow. Is there a way to define a core library upon which we can build, or tranches that can be built independently?

@coryan:

Is there a way to define a core library upon which we can build, or tranches that can be built independently?

We should probably move this discussion to a bug? In any case, yes we can do this. The current behavior is to build all the GA libraries:

https://github.com/conda-forge/google-cloud-cpp-feedstock/blob/21a8aa42ca12938326d4cc3e425f1e24b8166a95/recipe/build.sh#L14

The list is defined here:

https://github.com/googleapis/google-cloud-cpp/blob/dcc823e8dd34b61fe26aab5bf69f7438ca1801a8/cmake/GoogleCloudCppFeatures.cmake#L36

And it grows over time (maybe one or two features a month, though some "features" can be quite large). There are a set of (implicit) common features that we could build first or separately. Then the features are largely independent from each other, with a small number of caveats:

https://github.com/googleapis/google-cloud-cpp/blob/dcc823e8dd34b61fe26aab5bf69f7438ca1801a8/cmake/GoogleCloudCppFeatures.cmake#L166-L183

I am not sure how to translate this into the conda packaging model. Should we create different packages for the core components and then a package for each feature? That seems a bit overwhelming. Any other thoughts?

h-vetinari commented 1 year ago

In any case, yes we can do this.

That's great news!

There are a set of (implicit) common features that we could build first or separately

That's exactly what I was hoping for. If you had to guess, what kind of percentage of the current library would be the common parts? 10%, 50%, 80%?

Then the features are largely independent from each other, with a small number of caveats. I am not sure how to translate this into the conda packaging model. Should we create different packages for the core components and then a package for each feature? That seems a bit overwhelming. Any other thoughts?

First off, we wouldn't normally split things unless there's a benefit, like a use-case where only one part is needed and we could avoid pulling in 90% dead weight. That's not necessarily the case here a priori, but the way you're describing the features sounds like it could be. In any case, we're being forced to consider splitting simply due to the fact that it's becoming impossible to build the monolith in public CI.

I see at least two big options (with many subvariants):

Build core separately (e.g. [lib]google-cloud-cpp-core), then everything else on top (i.e. google-cloud-cpp).
Split off independent features into separate outputs or even feedstocks. For example the AWS SDK was split up into separate components (example), which can be specified as independent dependencies, or pulled in as a bunch through the SDK. These features could depend on each other, as long as the dependencies form a DAG.

The first one is probably the path of least resistance for now, but long-term we might want to think how we want to partition things.

Pinging @xhochy for thoughts since he split up the AWS stack. Note that this setup can become quite labour-intensive if all the different components get migrated separately and the subversions get updated often (causing the migrators to pile up) and/or the SDK releases often, c.f. https://github.com/conda-forge/aws-sdk-cpp-feedstock/issues/662

CC @conda-forge/core

coryan commented 1 year ago

In any case, yes we can do this.

That's great news!

There are a set of (implicit) common features that we could build first or separately

That's exactly what I was hoping for. If you had to guess, what kind of percentage of the current library would be the common parts? 10%, 50%, 80%?

Less than 10%, closer to 1%. The common components build 232 targets in about 4.5 minutes of CPU time[^1]. As you can see below the full build is about 7,000 targets and takes about 450 minutes of CPU time.

[^1]: The elapsed time is much smaller: my workstation has a lot of CPUs):

time cmake --build cmake-out --target google/cloud/all
... ...
[232/232] Creating library symlink google/cloud/libgoogle_cloud_cpp_rest_protobuf_internal.so.2 google/cloud/libgoogle_cloud_cpp_rest_protobuf_internal.so

real    0m17.473s
user    3m40.731s
sys     0m42.000s

Any other thoughts?

First off, we wouldn't normally split things unless there's a benefit, like a use-case where only one part is needed and we could avoid pulling in 90% dead weight.

I think there is a benefit. A lot of developers use only one feature at a time. The Google Cloud Storage library is very popular (and really the only thing used by Apache Arrow), and compiles relatively quickly:

time cmake --build cmake-out  --target google/cloud/storage/all
[191/191] Creating library symlink google/cloud/storage/libgoogle_cloud_cpp_storage.so.2 google/cloud/storage/libgoogle_cloud_cpp_storage.so

real    0m11.070s
user    7m3.113s
sys     1m17.989s

That includes the build time for the common dependencies[^2]. Something like Bigtable is a bit larger:

[^2]: You may wonder why it has fewer targets than the common components. It is because it does not compile all the common components. It does not use gRPC and some common components are there to support the 90+ features that do use gRPC.

time cmake --build cmake-out  --target google/cloud/bigtable/all
[26/318] Performing download step (download, verify and extract) for 'googleapis_download'
-- verifying file...
       file='/workspace/cmake-out/external/googleapis/src/a3f983b38c357a1e7a7810d9ad795756b77d4332.tar.gz'
-- File already exists and hash match (skip download):
  file='/workspace/cmake-out/external/googleapis/src/a3f983b38c357a1e7a7810d9ad795756b77d4332.tar.gz'
  SHA256='edc901180a3ebdd4b3b3086e7df2ca71f947433ebeb827796447c57491fb334e'
-- extracting...
     src='/workspace/cmake-out/external/googleapis/src/a3f983b38c357a1e7a7810d9ad795756b77d4332.tar.gz'
     dst='/workspace/cmake-out/external/googleapis/src/googleapis_download'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[318/318] Linking CXX shared library google/cloud/bigtable/libgoogle_cloud_cpp_bigtable_mocks.so

real    0m22.748s
user    10m36.804s
sys     2m14.435s

Contrast this against the full build:

time cmake --build cmake-out  --target all
... ...
[7415/7415] Creating library symlink google/cloud/vmmigration/libgoogle_cloud_cpp_vmmigration.so.2 google/cloud/vmmigration/libgoogle_cloud_cpp_vmmigration.so

real    3m59.020s
user    403m57.331s
sys     47m27.301s

That's not necessarily the case here a priori, but the way you're describing the features sounds like it could be. In any case, we're being forced to consider splitting simply due to the fact that it's becoming impossible to build the monolith in public CI.

Ack.

I see at least two big options (with many subvariants):

Build core separately (e.g. [lib]google-cloud-cpp-core), then everything else on top (i.e. google-cloud-cpp).

I think the core is too small for that to work, or for that to continue to work for a long time.

Split off independent features into separate outputs

Do you have an example of using "separate outputs"?

or even feedstocks.

It seems that using different feedstocks would require different source repositories? That is unlikely to happen. We tried keeping separate repos earlier in the project and we cannot sustain them.

For example the AWS SDK was split up into separate components (example), which can be specified as independent dependencies, or pulled in as a bunch through the SDK. These features could depend on each other, as long as the dependencies form a DAG.

I can see how the packaging would work, but the development requires more effort than we can sustain at this time.

The first one is probably the path of least resistance for now, but long-term we might want to think how we want to partition things.

Some random thoughts:

Just split "common" vs. "everything else" won't work or at least won't work for long.
Would it be possible to have multiple feedstocks with a single source tree? I think we can upstream whatever changes are needed to get such a setup working (e.g. adding helper build targets, adding configuration to use pre-built libraries for the common components).

Note that this setup

"this setup"? You mean something analogous to the AWS stack?

can become quite labour-intensive if all the different components get migrated separately and the subversions get updated often (causing the migrators to pile up) and/or the SDK releases often, c.f. conda-forge/aws-sdk-cpp-feedstock#662

I am not sure this is applicable. All the components get the same version. We treat all the code as a single unit, and release accordingly. Our release cadence is pretty consistent (once a month, with a few exceptions). We release patch versions a few times a year. But this is all moot if we cannot figure out a way to get multiple feedstocks from the same source.

Separately: we have changed our stance with respect to patch versions.

h-vetinari commented 1 year ago

It seems that using different feedstocks would require different source repositories? That is unlikely to happen.

No that's not the case. For example the LLVM stack (llvmdev, clangdev, openmp, lld, compiler-rt, libcxx, mlir) are all built from the same upstream.

Note that [the AWS setup in conda-forge] can become quite labour-intensive if all the different components get migrated separately and the subversions get updated often (causing the migrators to pile up) and/or the SDK releases often, c.f. conda-forge/aws-sdk-cpp-feedstock#662

I am not sure this is applicable. All the components get the same version. We treat all the code as a single unit, and release accordingly.

That's great!

Separately: we have changed our stance with respect to patch versions.

Even better! Means we can pin less hard and don't have to rebuild between .0 & .1

The Google Cloud Storage library is very popular (and really the only thing used by Apache Arrow), and compiles relatively quickly:

That sounds like a definite candidate for splitting off, because right now arrow ships with the whole shebang, and it's already a very big baby.

To recap, we don't need changes upstream to slice this up into smaller chunks here. Is there some sort of natural grouping of features, or maybe "generations" in terms of interdependence[^1]? I feel like the whole list is way too fine-grained (though we could split individual ones like storage off of the meta-package on an as-needed basis)

[^1]: e.g. all the stand-alone ones, all the ones depending only on the stand-alone ones (1 level of interdependence), the rest (2+ levels of interdependence)

coryan commented 1 year ago

It seems that using different feedstocks would require different source repositories? That is unlikely to happen.

No that's not the case. For example the LLVM stack (llvmdev, clangdev, openmp, lld, compiler-rt, libcxx, mlir) are all built from the same upstream.

Ah, that sounds promising.

The Google Cloud Storage library is very popular (and really the only thing used by Apache Arrow), and compiles relatively quickly:

That sounds like a definite candidate for splitting off, because right now arrow ships with the whole shebang, and it's already a very big baby.

Agreed.

To recap, we don't need changes upstream to slice this up into smaller chunks here. Is there some sort of natural grouping of features, or maybe "generations" in terms of interdependence?

Unfortunately the generations idea may not yield as much as we want. Broadly speaking there are currently 3 generations, 4 if I include any current plans:

Generation 3: pubsublite, it depends on things from generation 2, 1, and 0. Probably about 1% of the build time. Currently not a concern because it is still incomplete and thus labeled "experimental" and not built in the Conda packages.
Generation 2: asset, binaryauthorization, containeranalysis, contentwarehouse, and pubsub. Each of these depends on at least one component from generation 1 and all depend on components from generation 0.
Generation 1: everything else. There are 100+ components here and the bulk of the build time.
Generation 0: google-cloud-cpp::common, google-cloud-cpp::rest_internal, google-cloud-cpp::grpc_utils, and google-cloud-cpp::rest_protobuf_internal. This is maybe less than 5% of the code. Probably worth splitting, but not enough to make a difference.

I feel like the whole list is way too fine-grained (though we could split individual ones like storage off of the meta-package on an as-needed basis)

I agree. I think we may want to strike a balance:

Define the core components. I think we should create one of each bullet here:
- google-cloud-cpp::common: used by everything
- google-cloud-cpp::rest_internal: used by storage and any REST-based services (compute "soon")
- google-cloud-cpp::grpc_utils, and google-cloud-cpp::rest_protobuf_internal: used by everything else.
Define some ad-hoc splits, the most popular ones should get their own split:
- bigtable
- bigquery
- iam
- logging
- pubsub
- spanner
- storage
- trace
The bulk of the splits should be alphabetical. So all the features starting with a would go into a single group. It is arbitrary, but easy to reason about.

We need to think about the evolution of such a split. That is, what to do if a component gains new dependencies, or what to do if a component like "all the a's" gets too large.

The second problem first: if I component gets too large like "all the a's" and split it by:

Creating a new split for "only aiplatform"
Have "all the a's" depend on aiplatform and skip aiplatform in its own build.

The first problem becomes easy: if a component in "all the b's" gets a new dependency we just need to apply the same treatment to the new dependency. For example, if "foo" now depends on "bar", we would split bar from "all the b's" using the previous procedure and make "all the f's" depend on bar too.

I think these ideas may need a document for further refinement, but it is getting close.

h-vetinari commented 1 year ago

Define some ad-hoc splits, the most popular ones should get their own split:

bigtable, bigquery, iam, logging, pubsub, spanner, storage, trace

Sorry for the many estimation questions, but how much of the total would that (+core) be approximately? Because one option would just be "everything that doesn't have a dedicated subpackage is in the google-cloud-cpp omnibus package" (which contains everything, even though some library artefacts might come in only as a dependency on google-cloud-cpp-subX).

This has the advantage that it doesn't need much adaptation for new features (by default, they just start out in the omnibus), and popular and/or large features can be split off on an as-needed basis (and then turned into a dependency of google-cloud-cpp so it remains present in the omnibus).

I don't think the alphabetical approach is a good ordering mechanism (even if it's simple to explain); the above would also imply less less outputs & less work overall.

The one thing that would be nice to have for that would be a way to subtract features from __ga_libraries__. Because what we'd be doing would be something like:

feedstock: google-cloud-cpp-bigtable  # cmake --target google/cloud/bigtable/all
feedstock: google-cloud-cpp-bigquery  # cmake --target google/cloud/bigquery/all
feedstock: google-cloud-cpp-iam       # cmake --target google/cloud/iam/all
feedstock: google-cloud-cpp-logging   # cmake --target google/cloud/logging/all
feedstock: google-cloud-cpp-pubsub    # cmake --target google/cloud/pubsub/all
feedstock: google-cloud-cpp-spanner   # cmake --target google/cloud/spanner/all
feedstock: google-cloud-cpp-storage   # cmake --target google/cloud/storage/all
feedstock: google-cloud-cpp-trace     # cmake --target google/cloud/trace/all
feedstock: google-cloud-cpp
    script: # cmake --not-yet-existent-switch-to-exclude bigtable,bigquery,iam,logging,pubsub,spanner,storage,trace
    requirements:
      - google-cloud-cpp-bigtable
      - google-cloud-cpp-bigquery
      - google-cloud-cpp-iam
      - google-cloud-cpp-logging
      - google-cloud-cpp-pubsub
      - google-cloud-cpp-spanner
      - google-cloud-cpp-storage
      - google-cloud-cpp-trace

But even without that, it would still be possible to achieve that effect manually (e.g. through patching).

Note, depending on the combined size of the individual outputs, we could also just have one extra feedstock that does:


feedstock: google-cloud-cpp-individual-components
  - output: google-cloud-cpp-bigtable  # cmake --target google/cloud/bigtable/all
  - output: google-cloud-cpp-bigquery  # cmake --target google/cloud/bigquery/all
  - output: google-cloud-cpp-iam       # cmake --target google/cloud/iam/all
  - output: google-cloud-cpp-logging   # cmake --target google/cloud/logging/all
  - output: google-cloud-cpp-pubsub    # cmake --target google/cloud/pubsub/all
  - output: google-cloud-cpp-spanner   # cmake --target google/cloud/spanner/all
  - output: google-cloud-cpp-storage   # cmake --target google/cloud/storage/all
  - output: google-cloud-cpp-trace     # cmake --target google/cloud/trace/all
feedstock: google-cloud-cpp
# as above

This would avoid duplicating the build scripts in so many places.

coryan commented 1 year ago

Define some ad-hoc splits, the most popular ones should get their own split:

bigtable, bigquery, iam, logging, pubsub, spanner, storage, trace

Sorry for the many estimation questions, but how much of the total would that (+core) be approximately?

About 1000 out of 7000 targets.

Because one option would just be "everything that doesn't have a dedicated subpackage is in the google-cloud-cpp omnibus package" (which contains everything, even though some library artefacts might come in only as a dependency on google-cloud-cpp-subX).

This has the advantage that it doesn't need much adaptation for new features (by default, they just start out in the omnibus), and popular and/or large features can be split off on an as-needed basis (and then turned into a dependency of google-cloud-cpp so it remains present in the omnibus).

That works too.

The one thing that would be nice to have for that would be a way to subtract features from __ga_libraries__. Because what we'd be doing would be something like:

We can change the CMake code to handle something like:

cmake -S . -B ... \
  -DGOOGLE_CLOUD_CPP_FEATURES=__ga_libraries__,-bigquery,-bigtable,-iam,-logging,-pubsub,-spanner,-storage,-trace

The -foo syntax would remove the feature from the list. I need to think about the effect of removing features that depend on other features.

This would avoid duplicating the build scripts in so many places.

That seems like a good idea.

h-vetinari commented 1 year ago

I need to think about the effect of removing features that depend on other features.

This doesn't have to be a full-fledged general solution IMO. All the following would be fine IMO:

error hard if inconsistent
warn but build everything required by anything not deselected (i.e. lower precedence for removals than __ga_libraries__)
warn but don't build anything that depends on something deselected (i.e. higher precedence for removals)
hide it behind a scary flag like --thou-shalt-not-use-this-unless-you-know-exactly-what-you're-doing

As long as we have something basic to avoid rebuilding already-existing artefacts here, things would be fine (note that conda would deduplicate the installation here with what's in $PREFIX already, i.e. all the artefacts from already-installed individual google-cloud-cpp-* components, but here we explicitly cannot afford to rebuild redundantly because of the CI timeout).

I suspect that -- independently of how we switch off certain targets here -- there'd be some work to do either in the feedstock or upstream to ensure that something depending on google-cloud-cpp::storage can actually point to libgoogle_cloud_cpp_storage.so and not have to rebuild it.

coryan commented 1 year ago

I need to think about the effect of removing features that depend on other features.

This doesn't have to be a full-fledged general solution IMO. All the following would be fine IMO:

error hard if inconsistent

That sounds good enough, but the other solutions work too.

warn but build everything required by anything not deselected (i.e. lower precedence for removals than __ga_libraries__)

warn but don't build anything that depends on something deselected (i.e. higher precedence for removals)

hide it behind a scary flag like --thou-shalt-not-use-this-unless-you-know-exactly-what-you're-doing

[snip]

I suspect that -- independently of how we switch off certain targets here -- there'd be some work to do either in the feedstock or upstream to ensure that something depending on google-cloud-cpp::storage can actually point to libgoogle_cloud_cpp_storage.so and not have to rebuild it.

That is the trickiest bit of this I think. I think we would need to disable the code to build common components and replace it with find_package(google_cloud_cpp_common CONFIG REQUIRED). That is, this line:

https://github.com/googleapis/google-cloud-cpp/blob/70ef64ec3ea814a2df74eb9322ac8088df7bb5b4/CMakeLists.txt#LL240C31-L240C31

Would need to become something like:

if (GOOGLE_CLOUD_CPP_USE_PRE_INSTALLED_COMMON)
  find_package(google_cloud_cpp_common CONFIG REQURIED)
else ()
  add_subdirectory(google/cloud)
endif ()

I suspect we will find a number of places that link against google_cloud_cpp_${foo} instead of linking against google-cloud-cpp::${foo} too. We should fix those things upstream.

h-vetinari commented 1 year ago

Interestingly, even without aiplatform, the creeping size increases of libgoogle-cloud-cpp seem to play a role in running out of disk space when cross-compiling arrow: https://github.com/conda-forge/arrow-cpp-feedstock/pull/1092

Of course, the lib artefact is not that big (~40MB currently) and CI shouldn't be so tightly disk-constrained, but it's still emblematic of the increasing urgency here.

I think the next step would be a POC PR here that proves (say, only linux-64) that we can get libgoogle-cloud built on top of components that have been split off, and then move that to a new feedstock (through staged-recipes).

coryan commented 1 year ago

Of course, the lib artefact is not that big (~40MB currently) and CI shouldn't be so tightly disk-constrained, but it's still emblematic of the increasing urgency here.

I hear you. I still need to sit down and figure out how to compile the common protos in external/googleapis, e.g.:

https://github.com/googleapis/google-cloud-cpp/blob/e9dbdf74b3a4f840ce406375cd394da3228624ab/external/googleapis/CMakeLists.txt#L269

while allowing per-feature protos to be compiled in each feature directory, e.g. this:

https://github.com/googleapis/google-cloud-cpp/blob/e9dbdf74b3a4f840ce406375cd394da3228624ab/google/cloud/kms/CMakeLists.txt#L47-L56

needs to have the *.proto files listed here:

https://github.com/googleapis/google-cloud-cpp/blob/main/external/googleapis/protolists/kms.list

available at compile time.

coryan commented 1 year ago

I just created https://github.com/googleapis/google-cloud-cpp/pull/12049 to support splitting things. We are planning to add a CI build to make sure the configuration is not broken by accident.

coryan commented 1 year ago

I think the next step would be a POC PR here that proves (say, only linux-64) that we can get libgoogle-cloud built on top of components that have been split off,

Maybe this:

https://github.com/googleapis/google-cloud-cpp/blob/main/ci/cloudbuild/builds/cmake-split-install.sh

answers the same questions as that PR would?

and then move that to a new feedstock (through staged-recipes).

Where can I learn more about creating feedstocks and what is a staged recipe?

h-vetinari commented 1 year ago

Maybe this answers the same questions as that PR would?

I think we should still do this on a PR to this feedstock; if you have it working upstream it should be a relatively simple exercise (and I should be able to help or do it myself in a couple days).

Where can I learn more about creating feedstocks and what is a staged recipe?

It's the gateway + automation for creating new feedstocks: https://github.com/conda-forge/staged-recipes/ You basically open a PR with the full recipe, and once CI passes there, the feedstock gets created.

coryan commented 1 year ago

FWIW, I think #148 is looking good.

coryan commented 11 months ago

Okay. I finally spent some time preparing a PR for the conda-forge/staged-recipes repo.

I have a number of questions that maybe @h-vetinari would be kind enough to answer:

What happens to the `google-cloud-cpp` feedstock?

Should we use the current google-cloud-cpp feedstock to play the role of google-cloud-cpp-all? If so, I assume I would send a PR to change its role once the other feedstocks are created?

Tactics

Should I send one PR to staged-recipes with all the changes? That is unlikely to succeed because it will take more than 6h to build all the feedstocks. It also seems that the feedstocks are built in alphabetical order? At least with build-locally.py they seem to be.

It seems I should send google-cloud-cpp-core first and then other PRs. We won't know for sure if google-cloud-cpp-ai is too big or google-cloud-cpp-all requires more shards until those PRs are being processed, I hope that is Okay.

Feedstock names

Are the names (google-cloud-cpp-core, google-cloud-cpp-ai, and google-cloud-cpp-all) acceptable?

Should I shard even more?

We can create even smaller feedstocks if that is useful (e.g. if that would benefit other packages). I tried to create only a few feedstocks with several subpackages in them as that seems like a good tradeoff between cognitive load and staying within the desired build times, but we can keep on sharding if that helps.

h-vetinari commented 11 months ago

Great news, thanks a lot! I'll take a look at this soon! (please ping me if I don't)

coryan commented 11 months ago

I had forgotten about compute. This is a (fairly large) new feature that will definitely need its own feedstock. The new sequence is looking more like this:

https://github.com/conda-forge/staged-recipes/compare/main...coryan:staged-recipes:feat-shard-google-cloud-cpp-pr4

coryan commented 11 months ago

Ping. I can send the PR for google-cloud-cpp-core if that is a better way to start the conversation.

xhochy commented 11 months ago

Send the PR and also add me as a reviewer, happy to help!

h-vetinari commented 9 months ago

Alright, https://github.com/conda-forge/google-cloud-cpp-core-feedstock is live and caught up with arches and open migrations. 🥳

I think we should tryto build the full 2.17 here before we release any other versions? Would you be able to open a PR @coryan?

coryan commented 9 months ago

I think we should try to build the full 2.17 here before we release any other versions?

That makes sense to me.

Would you be able to open a PR @coryan?

Done, see #156.

Separately, I will create:

google-cloud-cpp-compute-feedstock. The compute component is very large. Once that is working we can update this feedstock.
google-cloud-cpp-ai-feedstock. As you may recall aiplatform is disabled here because it is very large. There are several other AI-related features that are somewhat bulky and I think it is worthwhile splitting to give us headroom (w.r.t. CI timeouts) in all the feedstocks.

coryan commented 9 months ago

I think we should try to build the full 2.17 here before we release any other versions?

That makes sense to me.

Would you be able to open a PR @coryan?

Done, see #156.

Argh... I got confused, the local tests failed. Will update the PR later today.

h-vetinari commented 9 months ago

Done, see #156.

Thanks a lot!

Separately, I will create [...]

Even better!

coryan commented 9 months ago

I created https://github.com/conda-forge/staged-recipes/pull/24843 and https://github.com/conda-forge/staged-recipes/pull/24841 . I am planning to leave them running overnight and see the results. I am hoping for successful builds, but I won't be shocked if they fail.

The build time after #156 is still pretty high. I had to stop the last build after 4h45m. I do not think we are out of the woods.

I think there are two ways to speed up the build (there may be others):

Merge https://github.com/conda-forge/staged-recipes/pull/24843 and then remove features compiled by that feedstock from google-cloud-cpp-all.
Change #156 to remove these features right away, in anticipation of https://github.com/conda-forge/staged-recipes/pull/24843.

I am happy to do either.

coryan commented 9 months ago

The last build for #156 finished in 5h51m... That is only 9m to spare. I think we will need to split this even more. I am going to use "number of files" as a proxy for the build time (obviously large / complex files take longer, but I said "proxy").

I prototyped a larger split in #157 that brings the build time down to 5h15m works, but requires 4 shards that are completely unrelated to the other proposed shards (*-ai-feedstock and *-compute-feedstock). These are bigquery, monitoring, retail and appengine.

If we cut even further we can bring the time down to 4h20m with 3 more unrelated shards (sql, resourcemanager, dataproc). These are also unrelated to any existing shards or proposed shards.

I am open to any approach for handling these, including one feedstock per library or one feedstock for all these larger components.

For reference, this is the "size" of each feature, based on the number of files in each (a decent, though not perfect proxy)

2540 google/cloud/compute: will live in google-cloud-cpp-compute-feedstock.
683 google/cloud/storage: already part of google-cloud-cpp-core-feedstock.
634 google/cloud/aiplatform: will live in google-cloud-cpp-ai-feedstock.
540 google/cloud/bigquery:
496 google/cloud/dialogflow_es: will live in google-cloud-cpp-ai-feedstock.
468 google/cloud/dialogflow_cx: will live in google-cloud-cpp-ai-feedstock.
411 google/cloud/bigtable: already part of google-cloud-cpp-core-feedstock.
368 google/cloud/pubsub: already part of google-cloud-cpp-core-feedstock.
361 google/cloud/spanner: already part of google-cloud-cpp-core-feedstock.
344 google/cloud/monitoring:
299 google/cloud/retail:
281 google/cloud/appengine:
255 google/cloud/sql:
228 google/cloud/resourcemanager:
210 google/cloud/dataproc:

coryan commented 9 months ago

Any thoughts?

h-vetinari commented 9 months ago

I think if builds finish under 6h on average, we have a first iteration that we can at least build by restarting once or twice.

We can of course increase the sharding wherever we feel that's beneficial, and overall your approach sounds good to me. I think the 5h15 is pretty good already from the POV of our pipelines, but given the rapid growth of the upstream codebase, we'll probably have to revisit this later anyway.

In this sense, I think this will very likely be an ongoing effort, and we shouldn't let perfect be the enemy of the good for the first iteration. I'll leave it up to you how far you want to push the sharding - for now I'm happy already if we get green CI on average.

coryan commented 9 months ago

In this sense, I think this will very likely be an ongoing effort, and we shouldn't let perfect be the enemy of the good for the first iteration.

Sounds good.

I'll leave it up to you how far you want to push the sharding - for now I'm happy already if we get green CI on average.

Thanks. We will need at least the following shards. I could use your help to get them created:

Meanwhile I will get #156 to run a couple more times and see if the average / mean is good enough.

coryan commented 8 months ago

Keeping in mind that we don't know the shape of the distribution, I just the following results:

265m
311m
316m
314m

That is with 3 shards: https://github.com/conda-forge/staged-recipes/pull/24970 https://github.com/conda-forge/staged-recipes/pull/24841 and https://github.com/conda-forge/staged-recipes/pull/24843

I call that good enough to start, though I am sure we will have to shard even further as the upstream keeps growing.

coryan commented 8 months ago

@h-vetinari I hope I have not exhausted your patience yet. Any thoughts on next steps?

h-vetinari commented 8 months ago

@h-vetinari I hope I have not exhausted your patience yet. Any thoughts on next steps?

Sorry for the late response, and my patience is not exhausted at all (though I am sometimes 😅). The approach with core + 3 shards + this feedstock sounds great to me (I had thought that my previous statement "I'm happy already if we get green CI on average" was unambiguous... 🙈). I'll try to have a look at the staged recipes PRs soon (please ping me in a few days if not).

h-vetinari commented 7 months ago

So AFAIU we should now have the initial set of necessary feedstocks. Once we have all the feedstocks building all the arches on the corresponding state of migration as this feedstock (chiefly protobuf), we should be able to finally build a "full" 2.17 here, and then move on to the current version?

coryan commented 7 months ago

I think we can close this bug. The actual split is done, we are planning to perform upgrades now, but that is normal maintenance of feedstocks.

conda-forge / google-cloud-cpp-feedstock