flux-framework / flux-workflow-examples

Collection of simple examples of flux as it is used in modern workflows
5 stars 8 forks source link

Example for running on Summit #9

Closed kshitij-v-mehta closed 5 years ago

kshitij-v-mehta commented 5 years ago

Hello Flux team, Does Flux run on Summit? If so, would you have some examples for it?

Thanks, Kshitij Mehta ORNL

dongahn commented 5 years ago

Yes it should run there.

You can either build and install flux-core and flux-sched manually or through spackle and invoke a flux instance using jsrun on a batch allocation. LLNL has a short readme on our confluence wiki for Sierra, I will attach a pdf here when I get to my computer.

Can I ask what is your use case?

kshitij-v-mehta commented 5 years ago
dongahn commented 5 years ago

SIERRA-UsingFluxtoscheduleandrunjobs-280119-0939-400.pdf

dongahn commented 5 years ago

@kshitij-v-mehta: I attached the pdf above. Now that I think about this, installing Flux using our Spack recipe would be the best practice on Summit as you have an additional dependency: "correctly" configured hwloc. @SteVwonder?

Feel free to use this ticket to ask any followup question you might have.

I am curious to know what the workflow definition will look like for Summit

If you have an example workflow you want to try on Flux, I am interested in seeing the resource requirements and requests of your workflow as well.

kshitij-v-mehta commented 5 years ago

Thanks. Specifically, my use case is this: use a workflow composition tool (Flux) to create a set of experiments. Each experiment consists of a workflow (simulation + analysis) in which its component applications are launched together. Then, submit a bunch of these experiments to Flux and let it manage the job sumission.

For example, I may have an allocation of 100 nodes and a bunch of experiments with varying node requirements. I believe Flux can schedule experiments, monitor them, and keep submitting experiments as experiments complete and resources become available.

Hope that helps. Let me know if you have questions.

dongahn commented 5 years ago

@kshitij-v-mehta: Yes. I believe your use cases should be covered well with the current capability. Thank you for the additional explanation!

SteVwonder commented 5 years ago

Installing Flux using our Spack recipe would be the best practice on Summit as you have an additional dependency: "correctly" configured hwloc.

Yep! spack install flux-sched@0.6.0+cuda should get you a Summit-compatible version. The +cuda builds hwloc with cuda support so that Flux can auto-discover the GPUs on the nodes. The latest release of flux-sched (0.7.0) isn't quite available yet via spack because the PR we have open hasn't been merged yet (https://github.com/spack/spack/pull/10447). @kshitij-v-mehta, let me know if you have any issues installing via Spack.

EDIT: spack install flux-sched@0.7.0+cuda should now work

dongahn commented 5 years ago

@tgamblin and @lee218llnl: any help on our latest Spack PR will be appreciated!

tgamblin commented 5 years ago

merged

kshitij-v-mehta commented 5 years ago

Not sure if this is the right place to discuss this .. I am curious on how Flux works on supercomputers such as Summit. Do you internally submit jsrun commands, or are you bypassing the system scheduler in some way?

dongahn commented 5 years ago

This is one way to use flux. But on such a system, you will get your allocation with lag and use jsrun to launch your flux instance. And then you will submit or launch your jobs in your workflow to that instance. Please look at the pdf I attached there. You should get some idea about how flux can be used in your workflow on Summit.

dongahn commented 5 years ago

@kshitij-v-mehta: do you need more from this issue ticket or can I close it?

kshitij-v-mehta commented 5 years ago

I don't need more. Please feel free to close it. Thanks for your help.

dongahn commented 5 years ago

Hopefully you got what you wanted to achieve. Some of us will visit ORNL in March so if you want to meet us, please let us know.