Open drphilmarshall opened 7 years ago
Yes, it would seem a validation is in order. There was nothing in the announcement regarding the implementation of multi-threading, so it will be interesting to see how the total workload is divided between threads (per photon? per source? something else?) and how the total execution time changes for some of our lengthiest visits. Heather has already offered to install the new code at SLAC & NERSC so we should be able to start working with v3.6 soon.
The work on checkpointing is coming along and I would like to get that project to a stage that it could be integrated into a workflow quickly. Note that dmtcp advertises full support for multi-threaded applications so this work will hopefully still apply with phoSim v3.6.
Excellent! I got the impression that the parallelization was per source, but it'd be good to check this in the v3.6 documentation. I had a quick look but there's no PIN about multithreading on the wiki, and the walkthrough has not been updated yet. Which source files should we be reading to understand how things work, John?
Its multithreaded on a per source basis (the photon level doesn’t work as there is too much thread divergence and inefficiencies). this is fine because for example the background is made up of thousands of sources.
all you have to do is have “-t N” where N is the number of threads. i will update the wiki documentation about this in a bit.
Hi John,
Have you tested it with 48 or more threads? Cori Phase II at NERSC supports up to 272 hardware threads (4 per core), so it'd be interesting to see if we can leverage that.
phosim v3.6 is now available on Cori: /global/common/cori/contrib/lsst/phosim/v3.6 To use, you will want to "source /global/common/cori/contrib/lsst/phosim/setupPhosim.sh" to adjust the modules loaded on Cori and then carry on to run phosim as you typically would
yeah, i think en-hsin did either 24 or 48 tests on a cluster here at Purdue. personally, i usually just to 4 or 8 on my laptop. at some point there will be diminishing returns from the non-threaded setup, but that might be around 48 anyways, is my guess.
john
Its multithreaded on a per source basis (the photon level doesn’t work as there is too much thread divergence and inefficiencies). this is fine because for example the background is made up of thousands of sources.
Hi John,
If it is per source do you mean they are done one-by-one and then (in principle) added later at near positions? Let's say there was a galaxy and and a star do you do anything to make sure that BF still works? If you added all the light from the galaxy and then the start afterwards the effect would be lost right? Just trying to understand exactly what you mean..
-Chris
chris-
its even better than that. so say have two bright sources that are overlapping like you are imagining and then you do 2 threads. what will happen is that it will be simulating photons both at the same time on two different cores, but whenever an electron is collected it will update the collected electron image while its going. the other thread will then get to see the e-field from those new electrons during its simulation. so it really shouldn’t have any difference whatsoever even in the case of brighter-fatter.
we have redone all the thousands of intergration tests with 4 threads instead of the usual 1 and i haven’t noticed any changes in results, so we should all be ok. (in fact, given the speed ups, we probably will always run multi-threaded validation runs from now on). but if anyone notices anything strange please let me know.
john
its even better than that. so say have two bright sources that are overlapping like you are imagining and then you do 2 threads. what will happen is that it will be simulating photons both at the same time on two different cores, but whenever an electron is collected it will update the collected electron image while its going. the other thread will then get to see the e-field from those new electrons during its simulation. so it really shouldn’t have any difference whatsoever even in the case of brighter-fatter.
That's great. Thanks.
Hi John,
I am interested in learning a bit about the control flow of the new phoSim. Is there a document or flow chart that might provide an overview?
Tom, i don’t have a document, but basically the multithreading happens only in the core raytrace calculation and doesn’t have anything to do with the overall phosim workflow.
you will want to still run phosim with the condor option and then use Glenn’s script to convert it to NERSC or SLAC job submission commands. the only difference is you use the “-t N” option for the phosim invocation to send the signal to the jobs to be threaded. Glenn was just testing to see if it still works with his script, but if not we will let you know and update with the new version.
john
On Dec 2, 2016, at 5:00 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:
Hi John,
I am interested in learning a bit about the control flow of the new phoSim. Is there a document or flow chart that might provide an overview?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/Twinkles/issues/420#issuecomment-264574095, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8iHLw5orh9WkCaVEWihzF-MAQ7wqks5rEJUJgaJpZM4LCrYR.
———————————
John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193
Some Twinkles validation of the new phoSim version 3.6.0 has been produced. Using exactly the same configuration as for the main Twinkles workflow task (TW-phoSim-r3), I have created two new workflow tasks: TW-phoSim-r3-MT (using phoSim v3.6.0 and running with four (4) threads); and, TW-phoSim-r3-noMT (using phoSim v3.6.0 with no multi-threading). A total of ten (10) visits were processed with TW-phoSim-r3-MT and five (5) visits with TW-phoSim-r3-noMT. Links to the workflow:
TW-phoSim-r3 TW-phoSim-r3-MT TW-phoSim-r3-noMT
While the multi-threaded phoSims were running, I was able to confirm the ongoing creation of up to four extra execution threads using a combination of tools (ps and top via lsrun, and farmrtmweb). I have not attempted to test different numbers of threads.
Timing results:
The multi-threaded phoSim appears to live up to its marketing! To summarize the ten+ten runs:
Average wall clock time ratio (v3.5.3/v3.6.0) = 4.1 Average CPU time ratio (v3.5.3/v3.6.0) = 1.4 Average job efficiency v3.5.3 (CPU/wall-clock) = 88% Average job efficiency v3.6.0 (CPU/wall-clock) = 66%
Note that there are situations when running large productions in which seemingly random jobs will exhibit unusual CPU and/or wall-clock times. This can be due to various reasons, such as transient I/O bottlenecks to a needed storage server; competing jobs on the batch host hogging critical resources; or other transient outages.
Part of the reason the v3.6.0 job efficiency took a hit is that during the phoSim execution, threads are continually being created and killed. Sometimes, not all four execution threads fully utilized for short periods of time. Part of this is likely due to the overhead of thread management, and part may be due to phoSim design. This loss of efficiency is offset by the reduction in total CPU time -- which I find slightly mysterious. In any event, the net savings in wall-clock time is significant, congratulations to the phoSim team!
(Some raw timing data comparing these 20 runs appear in this Google sheet)
Data Product Comparison: Each of these production and test runs produce only two output files: centroid(text) and image(fits). A three-way comparison (using 'diff' for the text and 'fdiff' for the fits files) indicated that none of the file combinations were identical in any sense of the term. The v3.6.0 fits files appear to have had some changes to the headers, but there are also significant differences in the body. The centroid files are quite different -- even with a few extra lines appearing.
Could these differences be attributed simply to random number seeds? Or other changes/features in the v3.6.0 release? Others with an interest in the difference details are invited to take a look for themselves. The files are at SLAC, e.g., for visit "000000":
TW-phoSim-r3: /nfs/farm/g/desc/u1/Pipeline-tasks/TW-phoSim-r3/phosim_output/000000/R22_S11/output
TW-phoSim-r3-MT: /nfs/farm/g/desc/u1/Pipeline-tasks/TW-phoSim-r3-MT/phosim_output/000000/R22_S11/output
TW-phoSim-r3-noMT /nfs/farm/g/desc/u1/Pipeline-tasks/TW-phoSim-r3-noMT/phosim_output/000000/R22_S11/output
Note that the visit index, "000000", may be replaced by "000001" through "000009" for TW-phoSim-r3-MT, and through "000004" for TW-phoSim-r3-noMT.
There was one configuration hiccup associated with v3.6.0. A new dependency on the phoSim installation's data/sky directory suddenly appeared and required the 'sky' directory to be placed adjacent to the (staged) copy of the SEDs. Perhaps John, and Co. could comment on whether this is a bug or a feature?
Please feel free to add comments to this issue thread.
@LSSTDESC/twinkles PhoSim v3.6 is out! The release email from the PhoSim team is pasted in below for our records. Congratulations, and a big thank you, to @johnrpeterson et al :-)
@TomGlanzman what timetable do you suggest we follow for running the remaining Twinkles 1 Run 3 "long jobs" with PhoSim 3.6, with the same commands and configuration as you used in Run 3.1, 3.2, and 3.3? I guess we'll need to check that the results from v3.6 match those from v3.5 in some of the short visits...