Closed moeyensj closed 2 years ago
I've made some changes to the code (it is not a large re-write) so that compute_ephemeris
is called many fewer times. I've been able to get a ~2x speedup and it produces consistent results at the level of number of potential matches. It would be good if there is another test we can run to verify those matches are the same.
Here are the comparison of results from before optimizations:
Found 27 potential matches for orbit ID: 330784
39042554 function calls (38957148 primitive calls) in 103.246 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
412778 36.080 0.000 40.942 0.000 orbit.py:192(compute_ephemeris)
1650521 6.209 0.000 6.209 0.000 {method 'reduce' of 'numpy.ufunc' objects}
524266 5.539 0.000 6.377 0.000 spherical_geom.py:79(haversine_distance_deg)
1543349 4.908 0.000 4.908 0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
412591 3.259 0.000 7.286 0.000 pixelfunc.py:153(check_theta_valid)
412591 3.138 0.000 3.138 0.000 healpix_geom.py:17(radec_to_thetaphi)
412591 2.750 0.000 24.312 0.000 pixelfunc.py:424(ang2pix)
825324 2.667 0.000 8.679 0.000 fromnumeric.py:69(_wrapreduction)
412115 2.355 0.000 33.676 0.000 orbit.py:247(approximately_propagate)
667 2.166 0.003 2.166 0.003 {method 'execute' of 'sqlite3.Cursor' objects}
503 2.159 0.004 96.365 0.192 precovery_db.py:146(_check_window)
412778 2.034 0.000 2.034 0.000 orbit.py:13(_ensure_pyoorb_initialized)
1954135 1.710 0.000 10.538 0.000 frame_db.py:197(<genexpr>)
1543347 1.687 0.000 8.442 0.000 result.py:381(iterrows)
413879 1.668 0.000 1.669 0.000 {built-in method numpy.array}
825182 1.309 0.000 3.745 0.000 {method 'all' of 'numpy.generic' objects}
825182 1.298 0.000 1.518 0.000 pixelfunc.py:1233(isnsideok)
825185 1.229 0.000 9.905 0.000 fromnumeric.py:2367(all)
977303/977133 1.131 0.000 1.138 0.000 {built-in method builtins.getattr}
825182 1.044 0.000 14.275 0.000 pixelfunc.py:1279(check_nside)
and after:
Found 27 potential matches for orbit ID: 330784
23987457 function calls (23845885 primitive calls) in 48.158 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
491 16.565 0.034 18.389 0.037 orbit.py:225(compute_ephemeris_)
476 7.967 0.017 8.095 0.017 orbit.py:192(compute_ephemeris)
1543349 3.011 0.000 3.011 0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
524266 1.696 0.000 2.504 0.000 frame_db.py:97(distance)
1954135 1.530 0.000 7.656 0.000 frame_db.py:197(<genexpr>)
1543347 1.216 0.000 5.808 0.000 result.py:381(iterrows)
503 0.938 0.002 43.084 0.086 precovery_db.py:208(_check_window)
524453 0.885 0.000 1.547 0.000 frame_db.py:506(iterate_observations)
667 0.880 0.001 0.880 0.001 {method 'execute' of 'sqlite3.Cursor' objects}
1543347 0.818 0.000 4.592 0.000 cursor.py:1791(_fetchiter_impl)
1543348 0.758 0.000 3.774 0.000 cursor.py:953(fetchone)
412591 0.739 0.000 8.764 0.000 frame_db.py:168(propagation_targets)
412778 0.596 0.000 0.596 0.000 orbit.py:263(__init__)
42 0.511 0.012 4.849 0.115 precovery_db.py:341(_check_frames_)
36932 0.455 0.000 0.702 0.000 inspect.py:744(cleandoc)
491 0.412 0.001 1.007 0.002 orbit.py:260(<listcomp>)
939172 0.398 0.000 0.677 0.000 __init__.py:1455(debug)
3651 0.370 0.000 0.436 0.000 ffi.py:149(__call__)
491 0.343 0.001 0.575 0.001 orbit.py:239(<listcomp>)
3496 0.336 0.000 0.337 0.000 {built-in method numpy.array}
You can see the number of calls to orbit.compute_ephemeris
has decreased from 412778 to 476, replaced by 491 (hopefully) equivalent calls to my "vectorized" function orbit.compute_ephemeris_
.
I think there is more speedup to be gained. I can add to your WIP pull request or start a new one so we can review the changes.
PR #18 has been merged into master: a 2x speed increase by making use of array-supported functions.
All initially identified items have been addressed in PRs #18 and #25.
On DiRAC's shared machine, a sample test orbit takes 95 seconds to return 27 matches in nearly 7 years of NSC observations (with profiling enabled). The profile indicates there are some low-hanging optimizations that can speed this time up:
Functions where vectorization in third-party packages isn't fully utilized but could be implemented:
Note that these times were measured after creating a multi-level index on
mjd
andhealpixel
in theframes
table. Code to do this on data ingests needs to be added to this PR as well.