E3SM-Project / scream

Fork of E3SM used to develop exascale global atmosphere model written in C++
https://e3sm-project.github.io/scream/
Other
80 stars 55 forks source link

test-all-scream: build baselines in different directory than the branch to test #408

Closed bartgol closed 4 years ago

bartgol commented 4 years ago

This would avoid some errors (like file present in master but not in the branch) to go under the testing script radar.

bartgol commented 4 years ago

I'm a bit debated on this.

On one hand, building branch in the same dir of baselines can save some compilation time. The branch compilation is lightspeed fast compared to baselines compilation, I'm guessing because of object files recycling (although I'm not sure why, given that we switch branch, and time stamps should change). Changin (or wiping) the build dir might bump our testing times up by a decent factor.

On the other hand, it happened with the ekat PR that a missedd change in a file (such as headers) might not trigger compiler errors. For instance, say b.h includes build-dir/subdir1/a.h in master, but the branch moves a.h in build-dir/subdir2. The developer, however, forgets to update b.h. Without changin (or wiping) the build tree, the line #include "build-dir/subdir1/a.h" will be fine, since a.h is still there from when we built baselines in master. Changin (or wiping) the build dir would cause a compiler error, and prevent the bug from going under the radar.

My opinion: I don't like to risk bugs getting in master, but this kind of bugs are rare.

@jgfouca @jeff-cohere do you have an opinion about this?

jeff-cohere commented 4 years ago

Well, this is a very natural source of tension for a code base that takes a long time to build.

Building from a clean start is kind of the rule with automatic testing systems, not the exception. It's the only way to prevent mysterious errors that result from intermediate products and byproducts. In my view, you have to consciously decide not to build from scratch.

Of course, C++ build times take forever and a day, so many C++ projects don't even bother to have a conversation about clean builds vs incremental builds. There are lots of ways for accelerating builds (write your code in a new language that's been designed to build faster, like Go or Rust! Buy a giant cluster for building the code, and take advantage of modern dependency tracking! Avoid template meta-meta-meta-metaprogramming!) But I don't think we're in a position to cut our build times by a large amount. I'm just pointing out that this is such a big problem that everyone on large projects thinks about it.

@bartgol , do you have numbers for build times for incremental and clean builds? What's the factor we're talking about, and how much in absolute time do clean builds cost us? I ask because, as you've observed, there is no substitute for a clean build in order to catch these kinds of bugs, as rare as they may be.

bartgol commented 4 years ago

I don't have concrete numbers. I want to wait for #410 to go in first, cause it has a couple of fixes for GPU testbeds too, so we can see what kind of numbers we're dealing with (a 100% penalty on a 4m exec time is fine, a 30% penalty on a 60m runtime is worse).

When I manually run test-all, however, I notice that there is barely any compilation time in the branch. Obviously, this varies from branch to branch (if you change scream_types.hpp, the whole library gets rebuilt, more or less). Edit: with PR #410, on waterman, a single build type (say dbg) takes 3m50s for both baselines generation and branch testing.

I will make an experiment for a few dummy branch that change, say, something in p3 only, vs share, vs control. I will try to get a ballpark of compilation time penalty when using two different directories.

For now, I'm leaning toward using two separate directories (and have a clean branch build), since our build times are still manageable. What concerns me is what happens in 6 months, when we also build homme, shoc, and whatnot, and maybe in 2yy eagles too. But I guess we can always revisit the two-folders solution when we cross that bridge.

bartgol commented 4 years ago

This has been done in #424 . Closing.