Closed ChristinaLK closed 4 years ago
I have given the following intro to scheduling talk, there are many potential diagrams in that presentation to simplify and illustrate concepts. I can edit them as well for specific example for addition to HPC carpentry.
Scheduler - in our current training material we depict scheduler as a "bouncer" manger a queue for crowded club (Slide 17 of https://www.uio.no/english/services/it/research/events/2018b/abel_intro_march2018.pdf) . If this makes sense, I can create a diagram (we do not have a citation for the current diagram) with CC-BY.
I love it! I've definitely compared a scheduler to the host/hostess at a restaurant, which is the same idea.
OK, I take that positive response as an encouragement and make a SVG (easier to modify and version control friendly) . I think the restaurant idea is more closer as we have tables with fixed number of seats. When you meant host/hostess I guess you were thinking more of when you arrive at the door and then someone take you in when a table is empty .
@Sabryr yes, that's what I meant. I also really like that analogy because (at least on our systems), jobs that request fewer resources will start sooner, just like smaller parties get seated faster at a busy restaurant. ;)
It may be worth a footnote that this is common, but it does depend on each site's scheduling policies and is not universal.
At our facility, queue policies are setup to encourage and favor large jobs. While the smallest jobs often quickly run as backfill, there is a middle ground that can lose out to larger jobs, depending on a variety of factors.
@bernhold I agree about the footnote, our site is the same.
Yes, Site specific configurations and SLURM configuration options for fair usage are important. When users know this they would have a better understanding on for example "why I had to wait longer today". While supporting the foot note idea, I suggest to elaborate this further in an "optional section" or similar (do not want to complicate stuff at this stage though).
@Sabryr and @ChristinaLK, I like the analogy of the host/head waiter/maître d' leading you to an appropriately sized table, once one becomes available.
@bernhold, I think the analogy holds: your facility would be like a restaurant with several very large tables, and few small ones. The medium-sized jobs just have to wait until a suitable table opens up, or until the maître d' can find a complementary group to add so that the composite fills a large table.
edited for spelling, jargon thesaurus, word choice
Cross-posted from #84
The metaphor seems to break down the further it stretches. In a restaurant, raw material is converted to finished results by the back-of-house staff, usually hidden in the kitchen: this is the parallel workforce. The front-of-house staff carry the results from the workers to the clients, more like an interconnect or intranet linking the HPC facility to the campus or Internet.
Perhaps better analogies could be drawn between a shared office space, where the workers are the professionals occupying each office. Reservations and access are managed through the front desk (workload manager). Different offices serve different purposes (architectures/accelerators): accounting jobs go to the accountant, legal to the lawyer, et cetera. A conference room (interconnect) permits efficient collaboration by temporary associations (communicators) of different professionals (nodes). A linear workflow can be crafted...
All that being said, explaining this extended metaphor in detail would be tantamount to describing the real HPC system in detail. I doubt this abstraction helps the learner to understand; it would take a couple walk-throughs in the class to get the facts straight; and it doesn't help anyone actually understand and use an HPC resource. The time would be better spent, in my opinion, in describing increasingly complex computational frameworks:
@tkphd I still think it's useful to present a metaphor (maybe more than one!)
It sounds like to be helpful, we should keep it rather simplified, just to avoid pushing it to the point where it breaks down.
@ChristinaLK, sure, I don't disagree. My argument is that the restaurant metaphor is best suited for explaining the scheduler as the head waiter, only. It has the added benefit that most people are familiar with the concept of a restaurant, so an illustration is not strictly necessary.
Finding additional, better-suited metaphors for workers and resources would be great.
I'm shocked to see #84 closed, which means I've failed to communicate constructively. @Sabryr, please accept my sincere apology for turning discussion of your work to a discouraging or hostile direction. My goal was to encourage further discussion, and eventually to have an adjusted version of your illustration for reference. I hope that you will consider re-engaging, and re-opening your pull request. I will certainly take this exchange as an opportunity to revise my tone and try harder to foster collaboration on this developing curriculum.
I had a couple of fruitful discussions with @guyer and @reid-a about the restaurant metaphor. While it's not the best fit for describing an entire HPC ecosystem, @guyer in particular came up with some useful features of a workload/queue manager that could be discussed:
Again, @Sabryr and @ChristinaLK, thanks for engaging in this discussion, and please accept my humble apology for derailing it. I was wrong.
@tkphd apology not required , the pull request was closed to submit a new one. Diff was too much to continue with that.
That's a relief, @Sabryr, and I look forward to seeing the new PR. I still stand by the apology, though, since I need to work on effectively communicating and dialing back dismissive comments. In particular, I fall into the common expertise trap of assuming things are obvious when they are, in fact, very much not.
Still don't see the need for the apology, thank you for reviews. I will try to open up the same pull, to keep the discussions intact.