B612-Asteroid-Institute / precovery

Fast precovery of small body observations at scale
BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

Use adam_core Propagator classes in Precovery #98

Closed ntellis closed 3 days ago

ntellis commented 2 weeks ago

Changes Orbit class to use adam_core propagator instead of in-baked pyoorb. Now defaults to assist, but that should be fungible. Will add an option to pass the propagator choice through so we can more easily bench assist vs pyoorb.

Precovery is running a bit slowly on my machine right now - I haven't been able to ascertain yet if that's due to using adam-assist or if I have something else slowing it down.

ntellis commented 2 weeks ago

This appears to be working fully now - added coveralls as well, but have not switched to pdm.

I'm still working on benching precovery with assist vs pyoorb - have yet to diagnose why it is so slow on my machine.

ntellis commented 1 week ago

This branch is stable, but the simple approach seems to be just too slow. At root is the repeated calls to propagation, each of which goes in and out of adam_core. There seems to be some significant overhead with the conversions to and from adam_core

Here are the top few function calls from a perf test of a single orbit precovery run, over ~300 days, which took 249s to run (far too long):

ncalls tottime percall cumtime percall filename:lineno(function) 870 128.886 0.148 128.894 0.148 {built-in method gc.collect} 403 37.873 0.094 37.980 0.094 compiler.py:242(backend_compile) 477461 12.368 0.000 21.119 0.000 tables.py:146(from_pyarrow) 3494348 7.113 0.000 7.113 0.000 compute.py:339(cast) 481915 6.451 0.000 28.030 0.000 columns.py:232(get) 442952 2.744 0.000 2.849 0.000 compute.py:239(wrapper) 67496 2.103 0.000 30.217 0.000 orbit.py:406(from_adam_core) 79870 1.683 0.000 3.243 0.000 mlir.py:368(_traceback_to_location) 253 1.577 0.006 6.802 0.027 state.py:19(get_observer_state) 1235 1.529 0.001 5.429 0.004 spice.py:96(get_perturber_state) 14422817 1.131 0.000 1.144 0.000 {built-in method builtins.isinstance} 870 0.991 0.001 0.992 0.001 ephem.py:45(init) 7061/403 0.905 0.000 12.081 0.030 mlir.py:1606(jaxpr_subcomp)

Notably, we seem to be spending a ton of time in garbage collection. The calls in orbit.py:406 and tables.py, columns.py in pyarrow both are likely initialization/conversion expense as well.

We were chatting on the huddle and decided it makes sense to write a parallel implementation of precovery_db.precover that uses all adam_core internals. We also discussed keeping the yielding behavior in this new implementation and were leaning towards dropping it and leaving the logic of managing/extending the time window to consumers (such as ipod), freeing up the precovery internals to leverage the quivr types more fully.

akoumjian commented 1 week ago

Let's make sure if possible we aren't making too many individual calls to propagate and instead passing in multiple times at once we do have a manual garbage collection step in the Adam assist propagate method but we could remove that. I'm not convinced it's actually helping.