I have been toying around a bit with the intersects logic. It seems we have a massive opportunity for a speedup when trying to find values in a large collection of SIDs that intersect a smaller collection of SIDs.
One of the most frequent use-case for me is a scenario where I have e.g. a large collection of iFOVs, for which I wish to extract the subset that intersects an ROI (which is typically given by a much smaller set of SIDs).
I can immediately toss out iFOVs with an SID that is larger than the largest SID of my ROI (small caveat)
I can immediately toss out iFOVs with an SID that is smaller than the smallest SID of my ROI (no caveat)
I can define an intersection level as the lower one of the highest resolutions of the iFOV and the ROI resolution (min of both maxes)
I coerce the resolution of the collection with the higher resolution (the iFOVs) to the intersection resolution. I then clear to the resolution bit. I can now take the set of the resulting coerced/cleared SIDs. This set may be orders of magnitude smaller than the original collection of SIDs (depending on the resolution difference between iFOV and ROI).
I now run pystare.intersects() on the coerced/cleared iFOV SID set and the ROI sids.
The working code is below. For some of my use-cases, this gives me a speedup of over 1000x
def speedy_intersects(sids_left, sids_right):
# Dropping values outside of range
top_bound = pystare.spatial_clear_to_resolution(sids_right.max())
level = pystare.spatial_resolution(top_bound)
top_bound += pystare.spatial_increment_from_level(level)
bottom_bound = sids_right.min()
candidate_sids = sids_left[(sids_left >= bottom_bound) * (sids_left <= top_bound)]
# finding the intersection level
left_min_level = pystare.spatial_resolution(candidate_sids).max()
right_min_level = pystare.spatial_resolution(sids_right).max()
intersecting_level = min(right_min_level, left_min_level)
# Clearing to intersecting level, allowing us to group them / extract only the distinct values
coerced_sids = pystare.spatial_coerce_resolution(candidate_sids, intersecting_res)
cleared_sids = pystare.spatial_clear_to_resolution(coerced_sids)
distinct_sids = numpy.unique(cleared_sids)
# Now doing the intersection on the distinct values
intersects = distinct[pystare.intersects(sids_right, distinct_sids)]
intersecting = candidate_sids[cleared_sids.isin(intersects)]
return intersecting
I have been toying around a bit with the intersects logic. It seems we have a massive opportunity for a speedup when trying to find values in a large collection of SIDs that intersect a smaller collection of SIDs.
One of the most frequent use-case for me is a scenario where I have e.g. a large collection of iFOVs, for which I wish to extract the subset that intersects an ROI (which is typically given by a much smaller set of SIDs).
The working code is below. For some of my use-cases, this gives me a speedup of over 1000x