SpatioTemporal / pystare

The Python interface for the SpatioTemporal Adaptive Resolution Encoding (STARE), a unified indexing for geolocated data.
https://pystare.readthedocs.io/en/latest/
12 stars 2 forks source link

speedy intersects #97

Closed NiklasPhabian closed 1 year ago

NiklasPhabian commented 2 years ago

I have been toying around a bit with the intersects logic. It seems we have a massive opportunity for a speedup when trying to find values in a large collection of SIDs that intersect a smaller collection of SIDs.

One of the most frequent use-case for me is a scenario where I have e.g. a large collection of iFOVs, for which I wish to extract the subset that intersects an ROI (which is typically given by a much smaller set of SIDs).

  1. I can immediately toss out iFOVs with an SID that is larger than the largest SID of my ROI (small caveat)
  2. I can immediately toss out iFOVs with an SID that is smaller than the smallest SID of my ROI (no caveat)
  3. I can define an intersection level as the lower one of the highest resolutions of the iFOV and the ROI resolution (min of both maxes)
  4. I coerce the resolution of the collection with the higher resolution (the iFOVs) to the intersection resolution. I then clear to the resolution bit. I can now take the set of the resulting coerced/cleared SIDs. This set may be orders of magnitude smaller than the original collection of SIDs (depending on the resolution difference between iFOV and ROI).
  5. I now run pystare.intersects() on the coerced/cleared iFOV SID set and the ROI sids.

The working code is below. For some of my use-cases, this gives me a speedup of over 1000x


def speedy_intersects(sids_left, sids_right):
    # Dropping values outside of range
    top_bound = pystare.spatial_clear_to_resolution(sids_right.max())
    level = pystare.spatial_resolution(top_bound)
    top_bound += pystare.spatial_increment_from_level(level)
    bottom_bound = sids_right.min()
    candidate_sids = sids_left[(sids_left >= bottom_bound) * (sids_left <= top_bound)]

    # finding the intersection level
    left_min_level = pystare.spatial_resolution(candidate_sids).max()
    right_min_level = pystare.spatial_resolution(sids_right).max()
    intersecting_level = min(right_min_level, left_min_level)

    # Clearing to intersecting level, allowing us to group them / extract only the distinct values
    coerced_sids = pystare.spatial_coerce_resolution(candidate_sids, intersecting_res)
    cleared_sids = pystare.spatial_clear_to_resolution(coerced_sids)
    distinct_sids = numpy.unique(cleared_sids)

    # Now doing the intersection on the distinct values
    intersects = distinct[pystare.intersects(sids_right, distinct_sids)]
    intersecting = candidate_sids[cleared_sids.isin(intersects)]
    return intersecting
NiklasPhabian commented 2 years ago

I moved this over to starepandas

NiklasPhabian commented 1 year ago

added