B612-Asteroid-Institute / precovery

Fast precovery of small body observations at scale
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

Profiling #17

Closed moeyensj closed 2 years ago

moeyensj commented 2 years ago

On DiRAC's shared machine, a sample test orbit takes 95 seconds to return 27 matches in nearly 7 years of NSC observations (with profiling enabled). The profile indicates there are some low-hanging optimizations that can speed this time up:

Found 27 potential matches for orbit ID: 330784
         38862790 function calls (38779094 primitive calls) in 95.102 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   412778   33.377    0.000   38.044    0.000 orbit.py:192(compute_ephemeris)
   524266    5.920    0.000    6.474    0.000 spherical_geom.py:79(haversine_distance_deg)
  1650539    5.709    0.000    5.709    0.000 {method 'reduce' of 'numpy.ufunc' objects}
  1543349    5.030    0.000    5.030    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
   412591    3.191    0.000    7.129    0.000 pixelfunc.py:153(check_theta_valid)
   412591    3.089    0.000    3.089    0.000 healpix_geom.py:17(radec_to_thetaphi)
   825327    2.596    0.000    7.628    0.000 fromnumeric.py:69(_wrapreduction)
   412591    2.419    0.000   22.506    0.000 pixelfunc.py:424(ang2pix)
   412115    2.159    0.000   31.309    0.000 orbit.py:240(approximately_propagate)
      503    2.046    0.004   90.738    0.180 precovery_db.py:146(_check_window)
   412778    1.978    0.000    1.978    0.000 orbit.py:13(_ensure_pyoorb_initialized)
   413897    1.680    0.000    1.680    0.000 {built-in method numpy.array}
  1954135    1.552    0.000   10.135    0.000 frame_db.py:197(<genexpr>)
  1543347    1.498    0.000    8.221    0.000 result.py:380(iterrows)
   825182    1.341    0.000    3.690    0.000 {method 'all' of 'numpy.generic' objects}
   825182    1.146    0.000    1.330    0.000 pixelfunc.py:1233(isnsideok)
   825185    1.121    0.000    8.747    0.000 fromnumeric.py:2367(all)
   412591    1.099    0.000   11.566    0.000 frame_db.py:168(propagation_targets)
   825182    0.993    0.000   12.958    0.000 pixelfunc.py:1279(check_nside)
   524266    0.954    0.000    7.771    0.000 frame_db.py:97(distance)

Functions where vectorization in third-party packages isn't fully utilized but could be implemented:

Note that these times were measured after creating a multi-level index on mjd and healpixel in the frames table. Code to do this on data ingests needs to be added to this PR as well.

stevenstetzler commented 2 years ago

I've made some changes to the code (it is not a large re-write) so that compute_ephemeris is called many fewer times. I've been able to get a ~2x speedup and it produces consistent results at the level of number of potential matches. It would be good if there is another test we can run to verify those matches are the same.

Here are the comparison of results from before optimizations:

Found 27 potential matches for orbit ID: 330784
         39042554 function calls (38957148 primitive calls) in 103.246 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   412778   36.080    0.000   40.942    0.000 orbit.py:192(compute_ephemeris)
  1650521    6.209    0.000    6.209    0.000 {method 'reduce' of 'numpy.ufunc' objects}
   524266    5.539    0.000    6.377    0.000 spherical_geom.py:79(haversine_distance_deg)
  1543349    4.908    0.000    4.908    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
   412591    3.259    0.000    7.286    0.000 pixelfunc.py:153(check_theta_valid)
   412591    3.138    0.000    3.138    0.000 healpix_geom.py:17(radec_to_thetaphi)
   412591    2.750    0.000   24.312    0.000 pixelfunc.py:424(ang2pix)
   825324    2.667    0.000    8.679    0.000 fromnumeric.py:69(_wrapreduction)
   412115    2.355    0.000   33.676    0.000 orbit.py:247(approximately_propagate)
      667    2.166    0.003    2.166    0.003 {method 'execute' of 'sqlite3.Cursor' objects}
      503    2.159    0.004   96.365    0.192 precovery_db.py:146(_check_window)
   412778    2.034    0.000    2.034    0.000 orbit.py:13(_ensure_pyoorb_initialized)
  1954135    1.710    0.000   10.538    0.000 frame_db.py:197(<genexpr>)
  1543347    1.687    0.000    8.442    0.000 result.py:381(iterrows)
   413879    1.668    0.000    1.669    0.000 {built-in method numpy.array}
   825182    1.309    0.000    3.745    0.000 {method 'all' of 'numpy.generic' objects}
   825182    1.298    0.000    1.518    0.000 pixelfunc.py:1233(isnsideok)
   825185    1.229    0.000    9.905    0.000 fromnumeric.py:2367(all)
977303/977133    1.131    0.000    1.138    0.000 {built-in method builtins.getattr}
   825182    1.044    0.000   14.275    0.000 pixelfunc.py:1279(check_nside)

and after:

Found 27 potential matches for orbit ID: 330784
         23987457 function calls (23845885 primitive calls) in 48.158 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      491   16.565    0.034   18.389    0.037 orbit.py:225(compute_ephemeris_)
      476    7.967    0.017    8.095    0.017 orbit.py:192(compute_ephemeris)
  1543349    3.011    0.000    3.011    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
   524266    1.696    0.000    2.504    0.000 frame_db.py:97(distance)
  1954135    1.530    0.000    7.656    0.000 frame_db.py:197(<genexpr>)
  1543347    1.216    0.000    5.808    0.000 result.py:381(iterrows)
      503    0.938    0.002   43.084    0.086 precovery_db.py:208(_check_window)
   524453    0.885    0.000    1.547    0.000 frame_db.py:506(iterate_observations)
      667    0.880    0.001    0.880    0.001 {method 'execute' of 'sqlite3.Cursor' objects}
  1543347    0.818    0.000    4.592    0.000 cursor.py:1791(_fetchiter_impl)
  1543348    0.758    0.000    3.774    0.000 cursor.py:953(fetchone)
   412591    0.739    0.000    8.764    0.000 frame_db.py:168(propagation_targets)
   412778    0.596    0.000    0.596    0.000 orbit.py:263(__init__)
       42    0.511    0.012    4.849    0.115 precovery_db.py:341(_check_frames_)
    36932    0.455    0.000    0.702    0.000 inspect.py:744(cleandoc)
      491    0.412    0.001    1.007    0.002 orbit.py:260(<listcomp>)
   939172    0.398    0.000    0.677    0.000 __init__.py:1455(debug)
     3651    0.370    0.000    0.436    0.000 ffi.py:149(__call__)
      491    0.343    0.001    0.575    0.001 orbit.py:239(<listcomp>)
     3496    0.336    0.000    0.337    0.000 {built-in method numpy.array}

You can see the number of calls to orbit.compute_ephemeris has decreased from 412778 to 476, replaced by 491 (hopefully) equivalent calls to my "vectorized" function orbit.compute_ephemeris_.

I think there is more speedup to be gained. I can add to your WIP pull request or start a new one so we can review the changes.

moeyensj commented 2 years ago

PR #18 has been merged into master: a 2x speed increase by making use of array-supported functions.

moeyensj commented 2 years ago

All initially identified items have been addressed in PRs #18 and #25.