B612-Asteroid-Institute / precovery

Fast precovery of small body observations at scale
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

Add mjd_start, mjd_mid, and exposure duration [ADAM-71] #54

Closed moeyensj closed 1 year ago

moeyensj commented 1 year ago

Adds the following quantities to the frames table:

Post-Debug and Analysis Summary: The indexing unit test has been updated to make sure the above columns are correctly reported back after a precovery search.

Note that the input test observations have been updated to use the latest version of oorb as the conda version of oorb is substantially out-of-date.

moeyensj commented 1 year ago

Interesting! The unit test passes on my computer but not in the docker container.

moeyensj commented 1 year ago

At least one difference is the version of oorb. I've been using conda to install oorb, and the Docker container pulls the latest and greatest. I'm wondering if oorb has swapped to using a different ephemeris file than the conda version.

moeyensj commented 1 year ago

The latest openorb has substantial enough changes compared to the one available on conda (with which the current test data was generated) that the unit test that tests indexing fails.

Here is the error from the docker container:

root@6deab441d341:/code# pytest .
...
>           np.testing.assert_allclose(
                results[["pred_ra_deg", "pred_dec_deg"]].values,
                object_observations[["ra", "dec"]].values,
                atol=1e-12,
                rtol=1e-12,
            )
E           AssertionError: 
E           Not equal to tolerance rtol=1e-12, atol=1e-12
E           
E           Mismatched elements: 90 / 120 (75%)
E           Max absolute difference: 2.09302669e-07
E           Max relative difference: 3.16944324e-09
E            x: array([[143.643415,  17.771414],
E                  [143.675804,  17.755943],
E                  [143.708188,  17.740464],...
E            y: array([[143.643415,  17.771414],
E                  [143.675804,  17.755943],
E                  [143.708188,  17.740464],...

If I then update the test data and re-run then the test passes.

root@6deab441d341:/code/tests# python3 make_observations.py  
root@6deab441d341:/code/tests# pytest . 
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /code, configfile: pyproject.toml
plugins: cov-4.0.0
collected 8 items                                                                                                                                                                                                                            

test_frame_db.py .                                                                                                                                                                                                                     [ 12%]
test_orbit.py ...                                                                                                                                                                                                                      [ 50%]
test_precovery.py .                                                                                                                                                                                                                    [ 62%]
test_precovery_index.py .                                                                                                                                                                                                              [ 75%]
test_spherical_geom.py ..                                                                                                                                                                                                              [100%]

============================================================================================================= 8 passed in 5.53s ==============================================================================================================

I will update the synthetic test data to use the latest version of openorb.

moeyensj commented 1 year ago

Good question, ZTF and LSST report the amount of time that the shutter was open when the observations were made. So for LSST simulations, it's about 30 seconds. Every source in the image will have its own unique observation time to account for the motion of the shutter across the focal plane, but the image itself (the exposure) will have a start MJD and an exposure duration (this is the quantity we care about, its a property of the image, not the observation if that makes sense). So for observation-specific MJDs, we will have mjd (the time of the observation taking into account shutter motion), mjd_start (time the shutter opened, property of exposure), mjd_mid (midpoint time of the exposure), exposure_duration (how long the shutter was open). This should I think address the LSST use case completely.

Translated to precovery speak: mjd_mid, mjd_start, exposure_duration will live in the frames table mjd will live in the binary observation files

If mjd != mjd_mid, then we do a small propagation to adjust the predicted position of the orbit to account for that discrepancy.

This will be implemented in a follow-up PR to this one.

akoumjian commented 1 year ago

If mjd != mjd_mid, then we do a small propagation to adjust the predicted position of the orbit to account for that discrepancy.

Not sure I follow this. If we have the real observation time then we know the more accurate location. Is this a query time alteration or index time alteration?

I'll have to review the ztf alerts schema, not sure if we can pull in the separate timestamp for the exposure and duration.

moeyensj commented 1 year ago

It's an alteration that occurs once we have pulled in the observations for a particular set of HEALPix frames but before we check the distance of the predicted location of the object to each observation.

What we are trying to avoid is the scenario where each individual observation that has a unique observation time gets placed into its own HealpixFrame. We want all observations that were measured from a single image to fall within a set of HealpixFrames that are only separated by the actual footprint on the sky relative to the HEALPix grid. Imagine for LSST that a single image gets spread across about ~4 HEALPixels (nside=32). Ideally, that would be represented as only 4 entries in the frames table. If we were instead to index on the unique observation times for each observation then each observation will be entered as its own entry in the frames table. For LSST that would mean 10,000s of frames for a single image.

Instead, we do all searches for potential HEALPixel overlap with the input orbits using the properties of the exposure. Once we identify those frames, then we load in the actual observations, and in the event of data from surveys like LSST where each observation has a more representative observation time, then we propagate the orbit from the midpoint time of the exposure to the precise time of every observation and then we make the comparison. Does that make sense at all? This is something that is easier to draw than describe.

akoumjian commented 1 year ago

That makes sense. The frames basically serve as an index. After the next few months I really want to experiment with storing observations in system like timescale, bigquery and use the observations themselves as the primitives instead of having to always lookup via the frames, but we're a ways off from that. These systems are theoretically capable of running queries against billions or trillions of entries.