OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Add transient and impulse noise masks #1142

Closed beatfactor closed 1 month ago

beatfactor commented 1 year ago

Overview

This PR introduces two new functions, get_impulse_noise_mask and get_transient_noise_mask that provide masks for impulsive noise (noise sources shorter than 1 ping) and transient noise (short, but still multi-ping noise sources). These are ported from the echopy library: mask_impulse.py mask_transient.py

Methodologies

The methodology employed uses both previously published methods defined:

Key Features

  1. Impulse noise - implements the Ryan two-sided comparison method as "ryan", an iterable modification of it that allows it to detect adjacent noise spikes, and the Wang erosion/dilation/median filtering method as "wang"
  2. Transient noise - implements the Ryan identified Sv spikes method as "ryan" and a previously unpublished method proposed by Fielding et al, which compares the data, ping by ping, with respect to a bloc in a reference layer – should the ping median be greater than the block median by a user-defined threshold, the ping will be masked until transient noise disappears or until it gets to the minimum range allowed by the user

Key Considerations

In order to ensure compatibility between the echopy and echopype libraries, several important factors were taken into account while adapting the echopy code – responsible for noise masking:

For each noise type, specific decisions were made to combine and adapt the masks appropriately within the echopype environment.

Function Signatures

def get_impulse_noise_mask(
    source_Sv: xr.Dataset,
    desired_channel: str,
    thr: Union[Tuple[float, float], int, float],
    m: Optional[Union[int, float]] = None,
    n: Optional[Union[int, Tuple[int, int]]] = None,
    erode: Optional[List[Tuple[int, int]]] = None,
    dilate: Optional[List[Tuple[int, int]]] = None,
    median: Optional[List[Tuple[int, int]]] = None,
    method: str = "ryan",
) -> xr.DataArray:

def get_transient_noise_mask(
    source_Sv: Union[xr.Dataset, str, pathlib.Path],
    desired_channel: str,
    mask_type: str = "ryan",
    **kwargs
) -> xr.DataArray:

Usage

The functions can be utilised to create masks for Sv data based on identified single-ping or short multi-ping noise sources.

codecov-commenter commented 1 year ago

Codecov Report

Merging #1142 (ec60163) into dev (6b734eb) will decrease coverage by 9.89%. Report is 49 commits behind head on dev. The diff coverage is 77.77%.

@@            Coverage Diff             @@
##              dev    #1142      +/-   ##
==========================================
- Coverage   80.71%   70.82%   -9.89%     
==========================================
  Files          67       10      -57     
  Lines        6056     1042    -5014     
==========================================
- Hits         4888      738    -4150     
+ Misses       1168      304     -864     
Flag Coverage Δ
unittests 70.82% <77.77%> (-9.89%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
echopype/utils/mask_transformation.py 62.99% <62.99%> (ø)
echopype/mask/mask_transient_noise.py 90.00% <90.00%> (ø)
echopype/mask/mask_impulse_noise.py 99.12% <99.12%> (ø)
echopype/mask/__init__.py 100.00% <100.00%> (ø)

... and 66 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

beatfactor commented 1 year ago

Codecov Report

@mihaiboldeanu Looks like we could do a with a few more tests.

leewujung commented 1 year ago

Hey folks, thanks for the PR!

I did a high-level review of the function/module organization and have the following suggestions. @emiliom also contributed to the discussion.

For the last 2 points, our convention has been to keep the main API call under a subpackage in subpackage/api.py, and put supporting functions into separate modules (.py files) with filenames indicating which api function they are used in.

In addition, in clean we currently only have one function with a generic name (remove_noise), even though that function implements a specific algorithm that aims to estimate the time-varying background noise. With your addition of new noise removal functions (once masked), we need to change this function name and figuring out the API pattern for noise-related functions that generate masks (like these added here) and functions that suppress some noise (the current remove_noise). But all that would be a separate PR.

ruxandra-valcu commented 1 year ago

One suggestion, in that case: let's move the test setup part to the base tests folder, since some of our non-noise-detection tests (shoal detection, seabed detection etc) will also use it. So tests/conftest.py rather than tests/clean/conftest.py. What do you think?

leewujung commented 1 year ago

One suggestion, in that case: let's move the test setup part to the base tests folder, since some of our non-noise-detection tests (shoal detection, seabed detection etc) will also use it. So tests/conftest.py rather than tests/clean/conftest.py. What do you think?

Ah I see. Yes in that case, moving the setup to tests/conftest.py sounds great. Thanks!

ruxandra-valcu commented 10 months ago

Hello @leewujung

I've rewritten this PR (both the already-reviewed impulse noise algorithms and the transient noise and signal attenuation ones) to use xarray functionality rather than numpy.

Please let me know if there are any other improvements I should make!

ctuguinay commented 1 month ago

Closing this since the Echopy functions were added in #1316 and #1326