OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

Improve potential bottlenecks in `compute_Sv` #1200

Closed lsetiawan closed 4 months ago

lsetiawan commented 10 months ago

Below are the 2 potential places where the calibration code (compute_Sv) code can have bottleneck:

Originally posted by @leewujung in https://github.com/OSOceanAcoustics/echopype/issues/1165#issuecomment-1776360813

anantmittal commented 10 months ago
pytest -vvrP echopype/tests/calibrate/test_cal_params.py::test_get_vend_cal_params_power
pytest -vvrP echopype/tests/calibrate/test_env_params.py::test_harmonize_env_param_time
anantmittal commented 9 months ago

@leewujung Could you explain a bit more about the potential bottleneck in get_vend_cal_params_power? Do we want to parallelize specific operations using dask? Or do we want to refactor the code using numpy/xarray itself to improve runtime?

anantmittal commented 9 months ago

@leewujung Similar question for improvement in harmonize_env_param_time. Are we looking into parallelizing specific (e.g., np.all) operations?

leewujung commented 9 months ago

Do we want to parallelize specific operations using dask? Or do we want to refactor the code using numpy/xarray itself to improve runtime?

I think it is both. We want to enable handling data with lazy-loaded arrays if it's not already, and ensure that the operations are distributed efficiently when dask supports them (and if not, see where dask is heading on the ops, since these ops are not very specific).

In get_vend_cal_params_power, I worry that expand_dims and to a lesser degree sortby is expensive.

In harmonize_env_param_time, the np.all in one of the conditions may be slow if time dimension is large, and .interp may also be slow.

anantmittal commented 9 months ago

For harmonize_env_param_time,

On running some experiments, np.all appears to be the fastest way when using just numpy. This link also shares the same thought.

leewujung commented 6 months ago

This harmonize_env_param_time component was addressed in #1235.

leewujung commented 4 months ago

We can close this now with #1235 and #1285 merged.