dirac-institute / kbmod

KBMOD (Kernel-Based Moving Object Detection)
BSD 2-Clause "Simplified" License
46 stars 14 forks source link

Add a KBMOD results filter for matching "known objects" #741

Closed wilsonbb closed 3 days ago

wilsonbb commented 1 week ago

Adds a filter for matching KBMOD results to "known objects", as defined by a user-provided astropy table specifying a catalog of objects we expect to find in the KBMOD data as part of addressing https://github.com/dirac-institute/kbmod/issues/528. The catalog of known objects can either be cached information of real known objects from a service such as astroquery or a catalog of inserted synthetic fakes a user has added to the data. The KBMOD results can either use the matching observations to identify which objects are recovered, which to be filtered out, and/or which individual result observations to be marked as invalid due to a match.

Such a catalog must have columns representing an object's:

On klone/hyak loading and filtering with an approximately 750 mb catalog of cached astroquery results data corresponding to cone searches around bore sights in the DEEP search was tested, and loading and filtering took about 30 seconds. So there is room for optimization but currently would likely not be a scaling bottle neck.

The filter can be called either in the kbmod search wrokflow in src/kbmod/run_search.py where it can be called multiple times for different data sources or from any post-processing steps with a saved KBMOD Results object and its saved WCS.

The matcher can match each result observation to potentially multiple objects with the user being able to apply thresholds for how close they need to be both spatially and temporally. Observations that match to known objects can the be set as invalid in the Results table's "obs_valid" column, and remove_match_obs=True Results table filtering is applied by the filter to remove results that no longer have enough matching observations. This is useful for results filtering because we can have cases where we constructed a result trajectory from a synthetic object on Day X that intersected with a known object on Day Y. Neither the matcher on the fakes catalog or on an astroquery generated catalog will have enough information to completely invalidate the result, but by each marking invalid observations, the entire result can be filtered out from new objects to investigate.

The matcher stores within the KBMOD results table which known objects matched to which observation for each KBMOD result (regardless of how many observations matched and the truth value of remove_match_obs). This preserves as much matching information as possible for cases such as when multiple known objects intersect different parts of a result trajectory. While this PR does not provide a convenient list of which expected recovered objects were not in the KBMOD results as requested in https://github.com/dirac-institute/kbmod/issues/528, the caller of the filter has all of the information needed to construct that.

An example workflow for filtering out known objects, identifying recovered fakes, and then processing potentially real results is provided below:

from kbmod.filters.known_object_filters import KnownObjsMatcher
from kbmod.results import Results

from astropy.table import Table

# Example setup
res = Results.read_table("/path/to/results")
obstimes = range(10) # Dummy obstimes
wcs = res["wcs"][0] # Dummy WCS

# Remove all real observations from real objects from results
real_obj_table = Table("/path/to/real_obj_table/")
real_obj_matcher = KnownObjsMatcher(
    real_obj_table,
    obstimes,
    matcher_name="real_obj_matcher",
    sep_thresh=1.0,
    time_thresh_s=600.0,
    match_min_obs=5,
)
res = real_obj_matcher.match(
            res,
            wcs = res["wcs"][0],
            update_obs_valid=True,
        )
# Filter out the matching observations as invalid, dropping any results that
# No longer have valid observations
res = real_obj_matcher.mark_match_obs_invalid(res, drop_empty_rows=True)

# Identify all recovered fakes
fake_table = Table("/path/to/real_obj_table/")
fake_matcher = KnownObjsMatcher(
    fake_table,
    obstimes,
    matcher_name="fakes_matcher",
    sep_thresh=1.0,
    time_thresh_s=600.0,
    match_obs_ratio=0.5,
)

# Here we match observations to our fakes. Note that this does not update the "obs_valid"
# column of the Results table
res = fake_matcher.match(
            res,
            wcs = res["wcs"][0],
        )

# Apply a threshold for how many observations from the fake catalog we had to recover
# in order for the fake to be found. (note that we already filter near the obstimes for this
# KBMOD run, so fakes on distant nights shouldn't matter).
res = fake_matcher.filter_obs_ratio(res)

# Get the recovered fakes and missed fakes
recovered, missed = fake_matcher.get_recovered_objects(res, fake_matcher.match_obs_ratio_col())
print(f"Recovered {len(recovered)} fakes and missed {len(missed)} fakes")

# Now we can filter out all recovered fakes to continue processing potential results with ML
res = fake_matcher.filter_matches(res, fake_matcher.match_obs_ratio_col())
res = ml_magic_yay(res)