Closed ctuguinay closed 3 months ago
Attention: Patch coverage is 96.15385%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 75.60%. Comparing base (
9f56124
) to head (1828b68
). Report is 117 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
echopype/echodata/echodata.py | 92.30% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Example Compute Sv Gist with OOI Data much larger than what was tested on in #1212: https://gist.github.com/ctuguinay/6195d318037f80eb60d9e9cb5dd7aa77
Also some remove background noise tests were failing, but that was out of the scope of this PR so I left it as an issue in #1332
TODO: The use_swap
implementation in open_raw
does something different with the Zarr store, but I didn't understand it so I stuck with what worked in my notebook. I need to investigate what I am currently doing: What happens when I save an Xarray Dataset object automatically to the zarr store from temp_zarr_store
. And I need to investigate what the original use_swap
in open_raw
is doing when it is explicitly saving to the zarr root of the zarr store from temp_zarr_store
.
EDIT:
create_temp_zarr
creates a new zarr folder in /tmp/echopype/
will Zarr folder name following this pattern: ep-swap--[RANDOM_ARRANGEMENT_OF_ALPHA_NUMERIC_CHARACTERS].zarr
Dataset.to_zarr(temp_zarr)
will result in saving all groups of the dataset normally in the temporary Zarr folderopen_raw
use_swap
option: Rectangularization of the parsed data can impose a HUGE memory expansion on it, and so before setting the groups/datasets of the Echodata object, one must put them in a temp Zarr store. That is why the rectangularize_data
code cannot do something as simple as to_zarr
: It is not in the form of an Xarray Dataset yet.Tested new functionality with a 2 million ping array (1 month of 2017 OOI EK60 data): https://gist.github.com/ctuguinay/8dbf6b89d9a58adbdefc90a63c112cb6
@leewujung This should be ready to review now
Oh one more slight TODO: In the new test, test the use_swap
and chunk_dict
ds["Sv"]
values against Echoview values.
EDIT: Added to both tests that check against Echoview and Matlab
Investigate zarr.sync.ThreadSynchronizer()
. Maybe also add some kwarg for storage options.
Edit: Added this
Wouldn't this be equivalent to below?
Oh are you thinking I move the to_zarr
to the API wrapper functions (compute_Sv
and compute_TS
)? I'm not quite sure I understand the question.
And also does this code below also imply the creation of this temporary Zarr Store?
cal_ds = _compute_cal(...)
cal_ds.to_zarr(final_zarr_store)
My question about why even without chunking, using use_swap=True would automatically make the number of layers = 2.
The order of events would go like this (in the case where cal_object
is not chunked but use_swap=True
:
cal_object
is computed and is in memory, and is not a dask array and thus has no dask graph layerscal_object
is sent to Zarr Storecal_object
is lazy loaded back into memory with no actual computation layers attached to it (other than the initial 2 Zarr Store reading layers) since it had no real computation layers to begin with prior to being sent to the Zarr StoreTODOs (from conversation with @leewujung today):
use_swap
backscatter_r
and backscatter_i
(if exists) variable nbyte sum in the Echodata object is above 2 GiB@ctuguinay :
backscatter_r
and backscatter_i
in the size limit? so the sum of both >2GB. For EK60 the backscatter_i
does not exist so some checks are needed.@leewujung I agree with the above 👍
@leewujung Thanks for the review! This should be ready for review again. Main thing I did was do this backscatter nbyte calculation based on sonar model, encode mode, waveform mode, and echodata beam group.
@leewujung Made your suggestion to extract encode and waveform from the calibration object
Addresses #1329