UCBerkeleySETI / hyperseti

A SETI / technosignature search code to find intelligent life beyond Earth
https://hyperseti.readthedocs.io
11 stars 4 forks source link

New idea for search algorithm: friends-of-friends #39

Closed telegraphic closed 2 years ago

telegraphic commented 3 years ago

This is an alternative approach to using prominent_peaks to find hits in doppler space. Gist:

def find_peaks_fof(data, threshold=10, link_length=10):
    """ Find peaks in image using Friends of Friends algorithm

    Uses pyfof.friends_of_friends function

    Args:
        data (np.array): 2D Data array (suggest that you normalize first!)
        threshold (float): Data threshold (i.e. signal-to-noise if normalized)
        link_length (float): Linking length between cluster members

    """

    # Get indexes for 
    xi, yi = np.where(data > threshold)
    vals_xiyi = data[xi, yi]

    # PyFoF need (Nx2) array to search through
    # Also needs to be float64 (inefficient recast!)
    xiyi = np.array((xi, yi)).astype('float64').T

    # Run Friends of friends algorithm
    groups = pyfof.friends_of_friends(xiyi, link_length)

    # For each group, find max value
    centroids_x, centroids_y, centroids_val = [], [], []
    for g in groups:
        # g is a list of indexes of the xiyi array
        # find idx for maximal point in vals_xiyi
        # Note: this is index of and index! (xiyi are indexes of data)
        idx_xiyi      = np.argmax(vals_xiyi[g])
        idx_dmax  = g[idx_xiyi] 

        centroids_x.append(xi[idx_dmax])
        centroids_y.append(yi[idx_dmax])
        centroids_val.append(vals_xiyi[idx_dmax])
    return centroids_x, centroids_y, centroids_val
wfarah commented 3 years ago

Probably not a very useful comment to make, but I've seen FoF used by ASKAP's FREDA to group dispersed candidates. And it works, so could be a viable replacement to the prominent_peaks

telegraphic commented 2 years ago

Putting this approach on ice, as it's proved too slow -- my current priority is getting a working pipeline that's comparable in speed to turboseti. See a bit more in technotes/05_peak_finding_comparison.md.

FYI, there's a GPU implementation of DBSCAN clustering algorithm (which I think is the same as FoF) in cuml which I tried out, and it just wasn't fast enough. Currently using argrelmax from cusignal, which is faster.