Hyperseti 1.1.0 - Githubissues

telegraphic commented 1 year ago

WTD Summary - 25 July 2023

What The Diff was unable to process this PR. Please log in to learn more.

what-the-diff[bot] commented 1 year ago

PR Summary

Added hit_summary method to HitBrowser This update adds a new method that provides a summary of hits in the browser.
Fixed bug in get_obs() method Resolved an issue where keys not in the schema were being loaded, causing a KeyError.
Updated DB Schema with 'signal extent' column The database schema now includes a new column called 'signal extent', and the hitsearch function has been updated to populate this field.
Improved find_et pipeline for large datasets on GPUs Optimized the find_et pipeline to allow overlapping gulps when reading data from DataArray, allowing polynomial fitting + edge blanking on gulps < coarse channel size. Especially useful for Parkes UWL data (128 MHz coarse channels!)

codecov-commenter commented 1 year ago

Codecov Report

Merging #109 (c20e05f) into master (bd0454e) will decrease coverage by 0.52%. The diff coverage is 95.54%.

:exclamation: Current head c20e05f differs from pull request most recent head 9ab2645. Consider uploading reports for the commit 9ab2645 to get more accurate results

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
- Coverage   94.41%   93.89%   -0.52%     
==========================================
  Files          30       33       +3     
  Lines        1790     1983     +193     
==========================================
+ Hits         1690     1862     +172     
- Misses        100      121      +21

Impacted Files	Coverage Δ
hyperseti/test_data/__init__.py	`86.95% <ø> (ø)`
hyperseti/io/hit_db.py	`90.74% <50.00%> (-2.53%)`	:arrow_down:
hyperseti/kernels/dedoppler.py	`89.18% <88.57%> (-10.82%)`	:arrow_down:
hyperseti/kernels/kernel_manager.py	`94.28% <93.93%> (ø)`
hyperseti/kernels/peak_finder.py	`95.60% <96.55%> (-0.34%)`	:arrow_down:
hyperseti/kernels/blank_hits.py	`97.05% <96.87%> (-2.95%)`	:arrow_down:
hyperseti/blanking.py	`92.42% <100.00%> (+0.23%)`	:arrow_up:
hyperseti/data_array.py	`95.10% <100.00%> (ø)`
hyperseti/dedoppler.py	`87.31% <100.00%> (-6.89%)`	:arrow_down:
hyperseti/dimension_scale.py	`97.00% <100.00%> (ø)`
... and 12 more

telegraphic commented 1 year ago

From smear_corr_kernel:

PR Summary

Moved dedoppler function to a new class The dedoppler function is now part of the DedopplerMan class for improved organization.
Added a base class for kernel management A new file, kernel_manager.py, has been introduced, which serves as a base class for kernel management in hyperseti/kernels/.
Switched to PeakFinderMan in peak_finder The peak_finder now uses the PeakFinderMan class instead of calling peak_find directly. This benefits overall code design.
Modified blanking hits implementation The blanking hits function now has its own loop over time samples within each beam rather than calling blank hit directly. This provides better functionality and flexibility.
Inheritance changes in PeakFinder The PeakFinder class now inherits from KernelManager, leading to a more structured and organized code.
Introduced SmearCorrMan class for smearing correction A new SmearCorrMan class has been added for handling smearing correction kernel management.
Optimized GPU memory allocation and deallocation GPU memory allocation is now managed within the workspace dictionary of each object, and a del method has been added for clean memory release after use.
Removed redundant code Some init() methods and redundant code have been removed from peak_finder and smear corr kernels, as they are now managed by their parent classes. This results in cleaner code.
Added test for smear_corr kernel A new test has been added for the smear_corr kernel to ensure its proper functioning.
Fixed frequency axis bug in dedoppler An issue with the frequency axis while using a custom plan (e.g., optimal) in dedoppler has been resolved.

telegraphic commented 1 year ago

PR Summary 2

Added a new kernel manager for handling hits A new kernel manager has been implemented to improve hit handling in dedoppler and peak finder kernels.
Modified dedoppler and peak finder kernels to use the new kernel manager Dedoppler and peak finder kernels now utilize the newly created kernel manager for better efficiency.
Updated tests to verify kernel manager functionality Tests have been updated to ensure kernel managers work as expected in test_dedoppler, test_peak_kernel, and test_smear_corr.
Fixed bug in hitsearch when no peaks above threshold value A bug where hitsearch was not returning hits when there were no peaks above the threshold value has been resolved.
Improved plotting code readability Changes have been made to the plotting code for better readability and understanding, but further improvements are still needed.

telegraphic commented 1 year ago

Stats from performance test on Parkes UWL data:

telegraphic commented 1 year ago

Takeaway: most time taken in merging hits, in a Pandas inbuilt query() -- this is already a C-based routine in Pandas so hard to speed up.

Could try a FoF algorithm instead of pd.query? Maybe DBSCAN?

Loading data from disk is currently taking 20% of time. Pipelining with bifrost ('hyperfrost') is not going to cut down as much time as improving hitsearch speed.

telegraphic commented 1 year ago

The get_signal_extent code in hitsearch is python only, but is not showing up as a bottleneck.

A potential python botteneck is sorting into groups - In PeakFinderMan.hitsearch:

            # Final stage: we need to make sure only one maxima within
            # The minimum spacing. We loop through and assign to groups
            # Then find the maximum for each group.
            # TODO: Speed up this code
            df = np.column_stack((np.arange(len(hits)), hits, idx_f, idx_t))

            ## Sort into groups
            groups = []
            cur = df[0]
            g = [cur, ]

            for row in df[1:]:
                if row[2] - cur[2] < min_spacing:
                    g.append(row)
                else:
                    groups.append(g)
                    g = [row, ]
                cur = row
            groups.append(g)

            df = []
            for g in groups:
                if len(g) == 1:
                    df.append(g[0])
                else:
                    mv, mi = g[0][1], 0
                    for i, h in enumerate(g[1:]):
                        if mv < h[1]:
                            mv, mi = h[1], i + 1 
                    df.append(g[mi])
            df = np.array(df)

telegraphic commented 1 year ago

Coming back from eight weeks of parental leave, and can't remember exactly where I was up to, so doing a YOLO merge and will see what breaks

UCBerkeleySETI / hyperseti

Hyperseti 1.1.0 #109

WTD Summary - 25 July 2023

PR Summary

Codecov Report

PR Summary

PR Summary 2