OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

Potential usage of xarray `map_blocks` #1199

Open lsetiawan opened 10 months ago

lsetiawan commented 10 months ago

Overview

In PR #1198 @anantmittal introduced a way to traverse through blocks to perform a where operation, which is not inherently parallel. I found that there is an xarray API called map_blocks: https://docs.xarray.dev/en/stable/generated/xarray.map_blocks.html that could do similar thing. However, this API is currently experimental. I initially refactored the code in PR to use this, but now I'm thinking that, let's actually keep the initial implementation with some tweaks.

This issue is meant to document the changes that I made using the map_blocks:

# Routine to apply frequency differencing
def _frequency_diff_and_mask(
    Sv_block: xr.DataArray, chanA: str, chanB: str, diff: float
) -> xr.DataArray:
    # get the left-hand side of condition
    lhs = Sv_block.sel(channel=chanA) - Sv_block.sel(channel=chanB)
    # create mask using operator lookup table
    da = xr.where(str2ops[operator](lhs, diff), True, False)

    return da

# If Sv data is not dask array
if not isinstance(source_Sv["Sv"].variable._data, dask.array.Array):
    da = _frequency_diff_and_mask(source_Sv["Sv"], chanA, chanB, diff)
# If Sv data is dask array
else:
    # Get the final data array template
    template = source_Sv["Sv"].isel(channel=0).drop_vars("channel")
    # NOTE: As of xarray version `2023.10.1`, `map_blocks` is experimental,
    # and its signature may change.

    # Iterate over all the chunks
    da = source_Sv["Sv"].map_blocks(
        _frequency_diff_and_mask,
        kwargs={
            "chanA": chanA,
            "chanB": chanB,
            "diff": diff,
        },
        template=template,
    )

# assign a name to DataArray
da.name = "mask"

# assign provenance attributes
xr_dataarray_attrs = {
    "mask_type": "frequency differencing",
    "history": f"{datetime.datetime.utcnow()} +00:00. "
    "Mask created by mask.frequency_differencing. "
    f"Operation: Sv['{chanA}'] - Sv['{chanB}'] {operator} {diff}",
}
da = da.assign_attrs(xr_dataarray_attrs)
MohamedNasser8 commented 4 months ago

Hello @leewujung I will work on this; can you assign it to me?