Closed richardxdubois closed 7 years ago
An increase of 50% in the shared queue has already been approved, i.e. to 3% of the Haswell partition, from 2% currently. We'll have 1900 Haswell nodes after integration - so that's 1824 cores. To motivate a further increase you'll need make some slides that I can take to management - Cori will be under a lot of pressure when it's returned to users, so we'll need to have a very strong case for dedicating more resources to the shared queue. Fair warning: I think it's very unlikely we'll get 300 nodes in the shared partition, but let's make the case. @TomGlanzman I think the slides you showed last week had most of this material - that's where you should start. You'll need to address following questions:
Thanks Debbie. Understood that slides will be needed. What follows continues the discussion eventually leading to those slides.
It seems fair to ask NERSC a question about their system and how it is managed. I have heard various rumors about increasing support for data intensive computing. Is this term a proxy for jobs that run in the shared/serial queue and, if so, why the limit of 3%? Once Cori comes fully online, will not the lion's share of processing be on the KNL partition? How about dedicating the entire Haswell partition to a more flexible arrangement of adjusting the queue/partition boundaries dynamically according to need and demand, up to and including 100% of Haswell to single-core jobs?
The time scale for beginning the Deep DC1 project is November (??) with a 1-2 month phoSim generation. Chris Walters or others may wish to clarify this point.
Our original schedule (see these milestones): https://github.com/DarkEnergyScienceCollaboration/SSim_DC1_Roadmap/milestones
So our plan was that this month we would set things up and do the validation and then start next month with the production taking one or two months.
Clearly this is not going to be done in the next two days so we need to reset the time. I would hope one month would be a maximum.
The memory footprint issue is still being understood. Until the development team recognizes this as a problem, no work is likely to happen. From the bitbucket chat I've seen, my guess is that this is probably a solvable problem with modest effort. But I could be wrong. On the other hand, the development team is working on multi-threading the phoSim code. We users do not have anything in the way of technical details of how this will be implemented or whether it will be successfully completed (and tested, validated, etc.) on a time frame of interest to Deep DC1. In the long term, this is the way the phoSim developers are headed. Clearly multi-threading will likely directly address the issue of utilizing many cores on a single host - if the developer's promise holds true.
From what I have heard we should not be planning on relying on multi-threaded PhoSim for DC1. I don't think it has been made publicly available yet.
Any chance of getting these slides together soon? We've already started the queue discussion for Cori as a whole, and it's hard for me to make the case for your needs without something substantial to show the group (but rest assured I am arguing the case).
From @TomGlanzman:
It seems fair to ask NERSC a question about their system and how it is managed. I have heard various rumors about increasing support for data intensive computing. Is this term a proxy for jobs that run in the shared/serial queue and, if so, why the limit of 3%? Once Cori comes fully online, will not the lion's share of processing be on the KNL partition? How about dedicating the entire Haswell partition to a more flexible arrangement of adjusting the queue/partition boundaries dynamically according to need and demand, up to and including 100% of Haswell to single-core jobs?
There is zero chance of the entire Haswell partition going to the shared configuration - it has to serve a large user base - but if you can make the case then we can try to press for a larger proportion than currently planned. Data-intensive computing does include "high-throughput" single-core jobs, but also encompasses large multi-node machine-learning codes, high-IO multi-node codes and real-time computing for running experiments. If you're really interested, we have a paper on the topic here.
We're working that today. The basic issue is getting signoff to cut out bright stars. that could make a x4 difference in the request. If that pans out, and we say we want a 1 month turnaround, the request could be that we get 3k cores for that month...
Interesting! I'll join the Twinkles meeting today and catch myself up.
Actually it is a x8 difference, not 4... I'll be at physio during the Twinkles meeting though.
We wound up using about 20M hrs for DC1. The CI group is now making estimates of our DC2 needs.
At 2500 cores, Tom estimates 3/4 yr to run DC1. Gulp. The least work mode for DESC currently is to use the shared partition. To be useful we would need factors more than 2500 cores. @djbard was asked to investigate upwards of 10k cores in the shared partition.