chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

Test arbitrary branch/SHA - Initial Design Choices #9159

Open awallace-cray opened 6 years ago

awallace-cray commented 6 years ago

Test arbitrary branch/SHA x configuration (in Cray-internal Jenkins)

Prepare a high-level functional scope & design summary for the initial increment. To begin with, what kinds of Jenkins jobs do Developers need most?
For example, should the initial version handle Cray module tests? Performance tests of any kind? Commit-triggered tests like release-tarball and chapel-code-smoketest?

Review this with interested parties.

Outcome: at least part of the team has a rough idea of the initial direction, and agrees with it.

awallace-cray commented 6 years ago

One way to do this: prepare some alternatives, possibly with pros + cons, then take a vote.

awallace-cray commented 6 years ago

2018-04-13 TBD: wait; then collect survey responses. 2018-04-16 11:00am: 7 responses received- probably good enough.

2018-04-18 : 9.5 responses.

Top three "I would use" answers:

  1. TESTS_TO_RUN option
  2. correctness-tests non-standard build slaves (winner: Cygwin)
  3. Cray module builds

All questions below: If Chapel Jenkins offered XXX, would you be interested in using that? 0 = No, I would not 1 = Not sure / need more info 2 = Maybe 3 = Yes, I would

Timestamp 1. Linux correctness-tests on chapNN? 2. Linux correctness-tests on chapcs? 3.correctness-tests on non-standard build slaves? 3.1 What kinds? Ubuntu-RHEL-Cygwin-Mac 4. Cray module builds and tests? 5. Chapel smoke-tests? 6. Whitebox correctness-tests? 7. multi-node/multi-locale correctness-tests on chapcs? 8. Flexible Chapel perf test jobs of any kind? 11. TESTS_TO_RUN option if available? 12. correctness-test-any job, if available?
4/13/18 10:02 2 2 3 Ubuntu RHEL Cygwin 3 2 3 2 1 3 0
4/13/18 10:04 0 0 2 Ubuntu Cygwin Mac 3 2 0 2 2 3 2
4/13/18 10:09 2 2 2 Cygwin 2 0 2 2 0 3 1
4/13/18 13:20 3 3 3 Cygwin Mac 3 3 2 3 2 3 1
4/13/18 14:29 0 0 3 Ubuntu RHEL Cygwin Mac 2 2 2 2 3 3 0
4/13/18 21:59 2 0 2 3 2 3 3 1 3 2
4/16/18 9:41 2 2 2 Ubuntu RHEL Cygwin 2 2 0 2 3 3 3
4/16/18 17:53 2 0 3 Ubuntu RHEL Cygwin 1 3 2 1 2 3 1
4/17/18 9:36 0 0 2 Cygwin Mac 2 2 3 2 2 3 2
Autosum 13 9 22 5-4-8-4 21 18 17 19 16 27 12
Score 13 9 22 20 18 17 18 14 27 9
Top 3 #2 Cygwin #3 #1
  1. Any other comments on "what kinds of jobs"?

I don't think it's necessarily necessary to involve Jenkins or the web UI. Reproducing nightly testing on Cray / weird systems is really important. I would like the ability to simultaneously request tests in many configurations.

It would be really convenient if we could fire off paratests from jenkins of common configs like:

  • linux64
  • gasnet-everything
  • memleaks
  • numa
  • llvm

This seems most helpful to me as a tool for testing big changes which might have effects across all or many configurations, for example branches or PRs which introduce major features or change existing ones in a major way. For these we'd want testing (often including performance) across many configs. A secondary use would be to make it easier for developers to test changes on secondary configs like RHEL etc. which are harder for them to build themselves. (Cray XE/XC might even count as "harder to build" for some devs, though certainly not all.) I'm very encouraged by the fact that this is being discussed. It would be great to have an easy, automated way to run tests in quirky configurations that are currently too much trouble to test manually. For many configurations, it seems useful to me to be able to dial in a subset of tests. Often running on hellos or examples would be sufficient to gain confidence—i.e., there's no need to literally run a full nightly configuration in many cases. Being able to select a test, directory, or list of both could be useful to get results back faster and sanity check things.

In addition to running the testing, it seems like it would be useful to have a mode in which Jenkins dropped you into an interactive shell with the environment set up properly, if that's possible / makes sense. What I'm envisioning is that I try running one of these flexible jobs, something fails, and now I need to go reproduce it to understand why and dig into fixing it. If I have to reproduce the environment manually we're almost back to square one. If there were a way to get dropped into an environment equivalent to the one where the test failed, that'd be ideal.

  1. Any other comments on "how flexible"

The questionnaire seems to be presuming an implementation in which developers choose one or more of the existing jenkins jobs to run. But that presumes there's an existing jenkins job for every config they want to test. In the past we've found there sometimes isn't. How hard would it be to turn the question on its head, as it were, and have the developer specify a set of Chapel configuration settings (the CHPL_* env var values) to be tested, perhaps with some additional non-standard env var values, with the flexible testing framework then driving the jenkins interface such that all those configs got tested? Basically the flexible testing framework would be a jenkins meta-tester. We could even repurpose it to drive the nightlies, in fact.

Tony: The first part is almost exactly what "correctness-test-any" does, by letting you enter any shell commands you want into a text box- and, by NOT forcing you to use an existing util/cron shell script. The second part sounds like a user-friendly, matrix-config version of correctness-test-any. Or a wrapper to call correctness-test-any, multiple times. What flexible testing framework? Is there an example?

Greg: That’s my response. Sorry for the confusing nomenclature -- by “flexible testing framework” I was referring to precisely the thing the questionnaire was about. I should have said “flexible jenkins”, or maybe “really flexible jenkins". What I was pointing out was that if we had a way to specify one or more of the config variables (example CHPL_HOST_PLATFORM and/or CHPL_TARGET_PLATFORM, CHPL_COMM, etc.), plus some optional other environment settings, plus the tests to run, plus maybe a few checkboxes to refine things (correctness vs. performance, e.g.), that would be enough for the underlying software to figure out which jenkins job(s) to run, and where. It sounds like the correctness-test-any setup does a similar thing, but its interface (typing shell commands into a text box) is lower-level than what I was imagining.

awallace-cray commented 6 years ago

2018-04-23 10:00 Little progress since 2018-04-16 due to competing priorities. Worked <= 0.5 story-point days, at intervals.