LSSTDESC / ComputingInfrastructure

Gathering place for CI - Computing and Infrastructure - issues
3 stars 1 forks source link

Test PhoSim on NERSC KNL partition #38

Closed richardxdubois closed 3 years ago

richardxdubois commented 7 years ago

In order to get a noticeable increase in NERSC allocation for 2017/DC2, we will need to demonstrate we can use KNL. We need to benchmark PhoSim on KNL.

This should be done before we submit a KNL time request by 2017-04-28

TomGlanzman commented 7 years ago

Update 1: After some flailing, managed to get a version of phoSim v3.6.1 built that runs on Cori-KNL. The minimal test consisted of making no changes to the compiler or its options, and minimal changes to the environment to build successfully. Once built, initial test was to simulate a single mag=20 star with no background. The bottom line is this took 3.3 min on Cori-Haswell and 17 min on Cori-KNL, a factor of 5.1 difference.

I have requested a KNL "reservation" of one node (68 physical cores) for 96 hours in order to expedite testing. This should be available soon.

Next steps will include using a more realistic sky catalog, adding in background such that the jobs spend most of their time in the raytracing code rather than in start-up activities. Following that, a look at the recommended compiler options for gcc to see if that might squeeze more performance out of KNL (and haswell). It has been stated that phoSim has been successfully built with the Intel compiler and if that continues to be straight forward to do, it seems a reasonable path toward better performance. A scaling test would also seem in order -- to verify that one can properly "fill" an entire KNL node with phoSim instances. Suggestions welcome.

brianv0 commented 7 years ago

That's with the same amount of threads? If so, that's to be expected in CPU-bound applications - it's known Atom-based KNL cores are roughly 1/5th the performance of the Haswell cores.

TomGlanzman commented 7 years ago

Update 2: As reported at yesterday's DESC-CI meeting, there seems to be a problem running phoSim on KNL with backgrounds enabled. The symptom is that at one point the process reduces to a single thread (if running in multi-threaded mode) which then runs longer than the maximum KNL queue limit of 24 hours. We need the help of @johnrpeterson and his team to understand where and why this is happening. Further testing is paused until this is resolved.

heather999 commented 3 years ago

Moved on from phoSim