The numbers are in seconds. The lines are organized as an inverted tree.## Indicates an item is a child of a later item.
For example:
## A
## B
#### C
#### D
## E
F
A, B, and E are F's children, and C and D are E's children. In other words, A + B + E = F, and C + D = E.
To see which section corresponds to what code, the branch that produced the following output can be found on the branch stdstar-timings.
Single Node
All of my runs ended up taking ~29 seconds, give or take a second, so here's one of the runs:
(time srun -n 32 -c 2 --cpu_bind=cores desi_fit_stdstars --mpi --frames /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-z5-00067765.fits --skymodels /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-z5-00067765.fits --fiberflats /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-b5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-r5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-z5-20201214.fits --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits --outfile /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/stdstars-5-00067765.fits.test --delta-color 0.1) |& tee -i logs
INFO:stdstars.py:125:log_time: Init: 0.22324848175048828
INFO:stdstars.py:125:log_time: Read data: 1.557213544845581
INFO:stdstars.py:125:log_time: Post-read data: 0.00014400482177734375
INFO:stdstars.py:125:log_time: Sky subtraction and trim data: 2.2630176544189453
INFO:stdstars.py:125:log_time: Check S/N: 0.0027837753295898438
INFO:stdstars.py:125:log_time: Filter stars: 0.010168313980102539
INFO:stdstars.py:125:log_time: Mask throughput dip: 0.00020265579223632812
INFO:stdstars.py:125:log_time: Read models: 7.2276670932769775
INFO:stdstars.py:125:log_time: ## Checks: 0.00024509429931640625
INFO:stdstars.py:125:log_time: ## Fetch filter curves: 0.7263596057891846
INFO:stdstars.py:125:log_time: #### Compute for loop of 6 items: 8.696992635726929
INFO:stdstars.py:125:log_time: ## Compute: 8.697783470153809
INFO:stdstars.py:125:log_time: Compute mags of models for each std star mag: 9.424587726593018
INFO:stdstars.py:125:log_time: Pre parallel: 0.014340877532958984
INFO:stdstars.py:125:log_time: Parallel section: 8.203190088272095
INFO:stdstars.py:125:log_time: Post parallel: 0.3902108669281006
INFO:stdstars.py:125:log_time: Total main: 29.317718744277954
The three items that take most time are (in chronological order):
Read models
Compute mags of models for each std star mag
Parallel section
In read models, the most time is spent on broadcasting stdflux to all the ranks. One solution is to pre-determine which rank needs which items in stdflux and only broadcast those.
In compute mags, the most time is spent in a for loop with 6 items that can probably be distributed to 6 ranks, but I don't yet know how to optimize beyond that.
The "parallel section" is actually a for loop over nstars, and it does contain some serial portions. Probably requires more scrutiny.
Two Nodes
The run time for 2 nodes is not drastically different compared to single node as of now, so the results are here just for reference.
(time srun -n 32 -c 2 --cpu_bind=cores desi_fit_stdstars --mpi --frames /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-z5-00067765.fits --skymodels /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-z5-00067765.fits --fiberflats /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-b5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-r5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-z5-20201214.fits --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits --outfile /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/stdstars-5-00067765.fits.test --delta-color 0.1) |& tee -i logs
INFO:stdstars.py:125:log_time: Init: 1.2593088150024414
INFO:stdstars.py:125:log_time: Read data: 1.657071828842163
INFO:stdstars.py:125:log_time: Post-read data: 0.00019073486328125
INFO:stdstars.py:125:log_time: Sky subtraction and trim data: 2.3670623302459717
INFO:stdstars.py:125:log_time: Check S/N: 0.0027475357055664062
INFO:stdstars.py:125:log_time: Filter stars: 0.010086774826049805
INFO:stdstars.py:125:log_time: Mask throughput dip: 0.00019621849060058594
INFO:stdstars.py:125:log_time: Read models: 7.9368696212768555
INFO:stdstars.py:125:log_time: ## Checks: 0.0002338886260986328
INFO:stdstars.py:125:log_time: ## Fetch filter curves: 0.690375566482544
INFO:stdstars.py:125:log_time: #### Compute for loop of 6 items: 8.39484190940857
INFO:stdstars.py:125:log_time: ## Compute: 8.480661630630493
INFO:stdstars.py:125:log_time: Compute mags of models for each std star mag: 9.171469688415527
INFO:stdstars.py:125:log_time: Pre parallel: 0.014284610748291016
INFO:stdstars.py:125:log_time: Parallel section: 5.3219077587127686
INFO:stdstars.py:125:log_time: Post parallel: 0.41874098777770996
INFO:stdstars.py:125:log_time: Total main: 28.16076350212097
How to read the results
The numbers are in seconds. The lines are organized as an inverted tree.
##
Indicates an item is a child of a later item. For example:A, B, and E are F's children, and C and D are E's children. In other words, A + B + E = F, and C + D = E. To see which section corresponds to what code, the branch that produced the following output can be found on the branch
stdstar-timings
.Single Node
All of my runs ended up taking ~29 seconds, give or take a second, so here's one of the runs:
MPI bcast times:
The three items that take most time are (in chronological order):
In read models, the most time is spent on broadcasting
stdflux
to all the ranks. One solution is to pre-determine which rank needs which items instdflux
and only broadcast those.In compute mags, the most time is spent in a for loop with 6 items that can probably be distributed to 6 ranks, but I don't yet know how to optimize beyond that.
The "parallel section" is actually a for loop over
nstars
, and it does contain some serial portions. Probably requires more scrutiny.Two Nodes
The run time for 2 nodes is not drastically different compared to single node as of now, so the results are here just for reference.
MPI bcast times: