UCBerkeleySETI / hyperseti

A SETI / technosignature search code to find intelligent life beyond Earth
https://hyperseti.readthedocs.io
11 stars 4 forks source link

Hyperseti 1.1.0 #109

Closed telegraphic closed 1 year ago

telegraphic commented 1 year ago

WTD Summary - 25 July 2023

What The Diff was unable to process this PR. Please log in to learn more.

what-the-diff[bot] commented 1 year ago

PR Summary

codecov-commenter commented 1 year ago

Codecov Report

Merging #109 (c20e05f) into master (bd0454e) will decrease coverage by 0.52%. The diff coverage is 95.54%.

:exclamation: Current head c20e05f differs from pull request most recent head 9ab2645. Consider uploading reports for the commit 9ab2645 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
- Coverage   94.41%   93.89%   -0.52%     
==========================================
  Files          30       33       +3     
  Lines        1790     1983     +193     
==========================================
+ Hits         1690     1862     +172     
- Misses        100      121      +21     
Impacted Files Coverage Δ
hyperseti/test_data/__init__.py 86.95% <ø> (ø)
hyperseti/io/hit_db.py 90.74% <50.00%> (-2.53%) :arrow_down:
hyperseti/kernels/dedoppler.py 89.18% <88.57%> (-10.82%) :arrow_down:
hyperseti/kernels/kernel_manager.py 94.28% <93.93%> (ø)
hyperseti/kernels/peak_finder.py 95.60% <96.55%> (-0.34%) :arrow_down:
hyperseti/kernels/blank_hits.py 97.05% <96.87%> (-2.95%) :arrow_down:
hyperseti/blanking.py 92.42% <100.00%> (+0.23%) :arrow_up:
hyperseti/data_array.py 95.10% <100.00%> (ø)
hyperseti/dedoppler.py 87.31% <100.00%> (-6.89%) :arrow_down:
hyperseti/dimension_scale.py 97.00% <100.00%> (ø)
... and 12 more

Impacted file tree graph

telegraphic commented 1 year ago

From smear_corr_kernel:

PR Summary

telegraphic commented 1 year ago

PR Summary 2

telegraphic commented 1 year ago

Stats from performance test on Parkes UWL data:

image
telegraphic commented 1 year ago

Takeaway: most time taken in merging hits, in a Pandas inbuilt query() -- this is already a C-based routine in Pandas so hard to speed up.

Could try a FoF algorithm instead of pd.query? Maybe DBSCAN?

Loading data from disk is currently taking 20% of time. Pipelining with bifrost ('hyperfrost') is not going to cut down as much time as improving hitsearch speed.

telegraphic commented 1 year ago

The get_signal_extent code in hitsearch is python only, but is not showing up as a bottleneck.

A potential python botteneck is sorting into groups - In PeakFinderMan.hitsearch:

            # Final stage: we need to make sure only one maxima within
            # The minimum spacing. We loop through and assign to groups
            # Then find the maximum for each group.
            # TODO: Speed up this code
            df = np.column_stack((np.arange(len(hits)), hits, idx_f, idx_t))

            ## Sort into groups
            groups = []
            cur = df[0]
            g = [cur, ]

            for row in df[1:]:
                if row[2] - cur[2] < min_spacing:
                    g.append(row)
                else:
                    groups.append(g)
                    g = [row, ]
                cur = row
            groups.append(g)

            df = []
            for g in groups:
                if len(g) == 1:
                    df.append(g[0])
                else:
                    mv, mi = g[0][1], 0
                    for i, h in enumerate(g[1:]):
                        if mv < h[1]:
                            mv, mi = h[1], i + 1 
                    df.append(g[mi])
            df = np.array(df)        
telegraphic commented 1 year ago

Coming back from eight weeks of parental leave, and can't remember exactly where I was up to, so doing a YOLO merge and will see what breaks