datacarpentry / hpc-carpentry

Information on an 'HPC' Carpentry that includes information about lesson in high throughput and high performance computing
5 stars 1 forks source link

Accomodating differences in HPC clusters #4

Open dbrunson opened 8 years ago

dbrunson commented 8 years ago

Any ideas on how to manage difference in HPC systems in the lessons? Here's a start for things to consider: hostname of login machine queueing system (and queue names, walltime limits, etc) use of environment modules or not (we do) filesystem differences (home/scratch, quotas, purge policies)

gvwilson commented 8 years ago

This was part of why previous efforts to put together an HPC lesson didn't get critical mass: everyone wanted a lesson that introduced their system's tools, and it seemed like every group had different ones. Skinning the lesson (i.e., having plug-and-play of some kind to swap in different tools) didn't seem workable because some of the differences are pretty fundamental...

ghost commented 8 years ago

I'd hoped to do that with our Moodle materials. They are under Creative Commons license, so could provide the starting point for custom lessons. I'm not aware of any way to allow on the fly substitution in a Moodle course, which is what would be needed to provide automated customization. And that would require a backing database. Maintenance would be a nightmare.

ghost commented 8 years ago

A generic hostname would work, but it has to be everywhere - examples, prompts, etc. Queueing system can be generic too - the hard thing is learning what they are, the second hardest is finding what queues are supported on which system under some job manager. In general, all four of Dana's points have to assume some facility with the shell environment. Module and Softenv are hopelessly confusing to people who barely understand what a shell is let alone an environment variable, and especially those as critical as PATH or LD_LIBRARY_PATH. Branching to many sophisticated concepts occur very rapidly as one tries to explain the environment. So, what to do? Treat it HPC like a tool rather than a body of technical knowledge. Get people to run applications with real data. That develops experience with the command line and some environment features. Then move up to running scripts. That layers on more shell concepts and gets to solutions of practical problems with repetitive analysis, file handling, etc. Then move up to an HPC system and relate what was learned before to the more complex environment. Introduce job scripts from the perspective of having to give someone else instructions to run your script for you. Talk about sharing resources with many users which leads to queueing in the job system, fairness issues, etc. Storage is a limited resource, which introduces quotas, the idea of an underlying file system - parallel or other, and then finally, policies that are layered on by the operators. That being design decisions based on target user population. The closer to the start you are in this scenario, the more generic you can afford to be. Branching based on type of job management is likely has the most options. Other are close to binary in practice (at least at major centers): Lustre or GPFS?, Module or Softenv? If the basics are nailed down, then the rest almost boils down to just remapping names to concepts.

ChristinaLK commented 8 years ago

I would like to see lessons that are half concepts/half example, where the example is always the responsibility of the relevant compute center/organization to create for their individualized training. The conceptual pieces (logging in to remote machines, using a scheduler, a checklist of good etiquette) are universal, and could provide an outline to pace a nice through-example.
Could be a nice marriage of the SWC "give-novices-mental-models" philosophy and the DC "work-with-domain-examples" philosophy. :)

lexnederbragt commented 8 years ago

@ChristinaLK +1

dbrunson commented 8 years ago

Yes to concepts! And a great version of teaching these concepts is Henry Neeman's Supercomputing in Plain English series: http://www.oscer.ou.edu/education.php (scroll down... scroll down a lot to see old videos.) The 2015 version was has videos, but they can't be posted until they have good captioning.

ChristinaLK commented 8 years ago

Henry linked those in another thread - some of the material looked at least one-two levels above novice, so might be good for more advanced lessons?

dbrunson commented 8 years ago

True, but the first overview talk has some novice conceptual stuff we can borrow. There is also a concession stand metaphor for how a scheduler works in the first "homework" exercise. Between the overview talk and the first written document there's quite a bit of material and good analogies. Do you have specific content in mind already?