ExaWorks / SDK

ExaWorks SDK
11 stars 12 forks source link

Initial test coverage needed for Flux #33

Closed dongahn closed 1 year ago

dongahn commented 3 years ago

Following up one of my TODO items from 4/23 team meeting.

Tagging @SteVwonder . It will be nice if you can capture some of the key initial testing coverage for Flux in the SDK context here.

SteVwonder commented 3 years ago

Two ideas:

These are of course not mutually exclusive, we can do both. If so, we should prioritize them and tackle them in that order.

My proposal would be to start with the in-tree testsuite and then move onto the MPI examples/mini-apps/benchmarks, since I believe we need some coordination with the other components on the latter bit (i.e., any component interested in testing MPI apps should use the same set - no need to duplicate the work of collecting the mini-apps).

Thoughts @dongahn?

dongahn commented 3 years ago

I like the multi-level approach.

cons: it does take quite a long time to run on the whimpy cloud vms/containers

Well... so far MS azure seems to give good machine hours for GitHub action so I wouldn't worry too much.

But when our target WF integrates our CI into theirs, this can be a problem. We may think about making it possible for them only use the lighter weight tests. (Second set). Another parameterization I guess?

SteVwonder commented 3 years ago

Spun out the second idea into #46, and now limiting this issue to just:

SteVwonder commented 3 years ago

Based on the Friday ☎️ call, we discussed dropping run the full flux-core CI tests and only running the MPI tests within flux-core. We are doing this since flux-core's CI is never run against flux-core w/ flux-sched installed. With enough effort, we believe it would be possible to make it work, but since flux-core is already running CI on its own repo, the benefit is probably minimal.

SteVwonder commented 3 years ago

I tracked down the issue with the python virtualenv not being respected in the flux testsuite. Turns out it was a bug in the flux-python helper command. PR posted: https://github.com/flux-framework/flux-core/pull/3713

This now opens the question: what do we want to do w.r.t. the Flux testing in the SDK. We cannot pull down any tagged versions and have them work in the current docker environment. I see two potential paths forward:

  1. Installed the python yaml package into the system interpreter for now to workaround the virtualenv "escape"
  2. Only build and test the current "master" branch version of flux-core until a new tag is released with the flux-python fix, then switch back to testing the latest tagged release.

My vote is for the latter, but I'm happy to go with the majority opinion on this.