desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

desi_fit_stdstars profiling results and next steps #1287

Open ziyaointl opened 3 years ago

ziyaointl commented 3 years ago

How to read the results

The numbers are in seconds. The lines are organized as an inverted tree.## Indicates an item is a child of a later item. For example:

## A
## B
#### C
#### D
## E
F

A, B, and E are F's children, and C and D are E's children. In other words, A + B + E = F, and C + D = E. To see which section corresponds to what code, the branch that produced the following output can be found on the branch stdstar-timings.

Single Node

All of my runs ended up taking ~29 seconds, give or take a second, so here's one of the runs:

(time srun -n 32 -c 2 --cpu_bind=cores desi_fit_stdstars --mpi --frames /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-z5-00067765.fits --skymodels /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-z5-00067765.fits --fiberflats /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-b5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-r5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-z5-20201214.fits --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits --outfile /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/stdstars-5-00067765.fits.test --delta-color 0.1) |& tee -i logs

INFO:stdstars.py:125:log_time: Init: 0.22324848175048828
INFO:stdstars.py:125:log_time: Read data: 1.557213544845581
INFO:stdstars.py:125:log_time: Post-read data: 0.00014400482177734375
INFO:stdstars.py:125:log_time: Sky subtraction and trim data: 2.2630176544189453
INFO:stdstars.py:125:log_time: Check S/N: 0.0027837753295898438
INFO:stdstars.py:125:log_time: Filter stars: 0.010168313980102539
INFO:stdstars.py:125:log_time: Mask throughput dip: 0.00020265579223632812
INFO:stdstars.py:125:log_time: Read models: 7.2276670932769775
INFO:stdstars.py:125:log_time: ## Checks: 0.00024509429931640625
INFO:stdstars.py:125:log_time: ## Fetch filter curves: 0.7263596057891846
INFO:stdstars.py:125:log_time: #### Compute for loop of 6 items: 8.696992635726929
INFO:stdstars.py:125:log_time: ## Compute: 8.697783470153809
INFO:stdstars.py:125:log_time: Compute mags of models for each std star mag: 9.424587726593018
INFO:stdstars.py:125:log_time: Pre parallel: 0.014340877532958984
INFO:stdstars.py:125:log_time: Parallel section: 8.203190088272095
INFO:stdstars.py:125:log_time: Post parallel: 0.3902108669281006
INFO:stdstars.py:125:log_time: Total main: 29.317718744277954

MPI bcast times:

INFO:stdstars.py:279:main: 0th MPI bcast time: starfibers, starindices, fibermap: 0.009181499481201172
INFO:stdstars.py:286:main: 0th MPI bcast time: gaia_indices, legacy_indices: 0.0018088817596435547
INFO:stdstars.py:439:main: 0th MPI bcast time: frames: 2.257209300994873
INFO:stdstars.py:524:main: 0th MPI bcast time: stdwave: 0.001990079879760742
INFO:stdstars.py:528:main: 0th MPI bcast time: stdflux: 4.300034523010254
INFO:stdstars.py:532:main: 0th MPI bcast time: templateid: 0.0002276897430419922
INFO:stdstars.py:536:main: 0th MPI bcast time: teff: 9.274482727050781e-05
INFO:stdstars.py:540:main: 0th MPI bcast time: logg: 6.413459777832031e-05
INFO:stdstars.py:544:main: 0th MPI bcast time: feh: 1.7415003776550293
INFO:stdstars.py:599:main: 0th MPI bcast time: model_mags: 0.0004971027374267578

The three items that take most time are (in chronological order):

  1. Read models
  2. Compute mags of models for each std star mag
  3. Parallel section

In read models, the most time is spent on broadcasting stdflux to all the ranks. One solution is to pre-determine which rank needs which items in stdflux and only broadcast those.

In compute mags, the most time is spent in a for loop with 6 items that can probably be distributed to 6 ranks, but I don't yet know how to optimize beyond that.

The "parallel section" is actually a for loop over nstars, and it does contain some serial portions. Probably requires more scrutiny.

Two Nodes

The run time for 2 nodes is not drastically different compared to single node as of now, so the results are here just for reference.

(time srun -n 32 -c 2 --cpu_bind=cores desi_fit_stdstars --mpi --frames /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/frame-z5-00067765.fits --skymodels /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-b5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-r5-00067765.fits /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/sky-z5-00067765.fits --fiberflats /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-b5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-r5-20201214.fits /global/cfs/cdirs/desi/spectro/desi_spectro_calib/trunk/spec/sm9/fiberflatnight-z5-20201214.fits --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits --outfile /global/cfs/cdirs/desi/users/ziyaoz/skysub/exposures/20201214/00067765/stdstars-5-00067765.fits.test --delta-color 0.1) |& tee -i logs

INFO:stdstars.py:125:log_time: Init: 1.2593088150024414
INFO:stdstars.py:125:log_time: Read data: 1.657071828842163
INFO:stdstars.py:125:log_time: Post-read data: 0.00019073486328125
INFO:stdstars.py:125:log_time: Sky subtraction and trim data: 2.3670623302459717
INFO:stdstars.py:125:log_time: Check S/N: 0.0027475357055664062
INFO:stdstars.py:125:log_time: Filter stars: 0.010086774826049805
INFO:stdstars.py:125:log_time: Mask throughput dip: 0.00019621849060058594
INFO:stdstars.py:125:log_time: Read models: 7.9368696212768555
INFO:stdstars.py:125:log_time: ## Checks: 0.0002338886260986328
INFO:stdstars.py:125:log_time: ## Fetch filter curves: 0.690375566482544
INFO:stdstars.py:125:log_time: #### Compute for loop of 6 items: 8.39484190940857
INFO:stdstars.py:125:log_time: ## Compute: 8.480661630630493
INFO:stdstars.py:125:log_time: Compute mags of models for each std star mag: 9.171469688415527
INFO:stdstars.py:125:log_time: Pre parallel: 0.014284610748291016
INFO:stdstars.py:125:log_time: Parallel section: 5.3219077587127686
INFO:stdstars.py:125:log_time: Post parallel: 0.41874098777770996
INFO:stdstars.py:125:log_time: Total main: 28.16076350212097

MPI bcast times:

INFO:stdstars.py:279:main: 0th MPI bcast time: starfibers, starindices, fibermap: 0.030261754989624023
INFO:stdstars.py:286:main: 0th MPI bcast time: gaia_indices, legacy_indices: 0.0025217533111572266
INFO:stdstars.py:439:main: 0th MPI bcast time: frames: 2.3613641262054443
INFO:stdstars.py:524:main: 0th MPI bcast time: stdwave: 0.20648193359375
INFO:stdstars.py:528:main: 0th MPI bcast time: stdflux: 4.429826498031616
INFO:stdstars.py:532:main: 0th MPI bcast time: templateid: 0.0002586841583251953
INFO:stdstars.py:536:main: 0th MPI bcast time: teff: 0.00011229515075683594
INFO:stdstars.py:540:main: 0th MPI bcast time: logg: 0.0001087188720703125
INFO:stdstars.py:544:main: 0th MPI bcast time: feh: 2.174900531768799
INFO:stdstars.py:599:main: 0th MPI bcast time: model_mags: 0.08548212051391602
moustakas commented 3 years ago

Addressing #1251 should help.