Closed ctuguinay closed 2 weeks ago
Code used (can also be found in the Gist):
# Combine Echodata Objects
ed_future_list = []
for converted_file in tqdm(sorted(Path('convertallfiles').glob("*.zarr"))):
ed_future = client.submit(
ep.open_converted,
retries=2,
converted_raw_path=converted_file,
chunks={}
)
ed_future_list.append(ed_future)
ed_list = client.gather(ed_future_list)
ed_combined = ep.combine_echodata(ed_list)
TODO: Close this issue after summarizing conversation OOI ep.combine_echodata
example with @leewujung
This was resolved with conversation with Wu-Jung:
So the initial reason that I thought this was not 'parallelized properly' was that the Dask Dashboard was reloading sets of tasks that were incredibly small, and this done for each Echodata object:
Turns out, this was a set of tasks to lazily append a Zarr Store to an existing Zarr store. This could not be done with all Echodata objects at the same time since coordinating where to properly place to-be-appended Zarr Store arrays while continually appending to the Zarr Store array of interest would be a complicated and most likely inaccurate procedure.
Just to add to the above:
.to_zarr
is issued
From #1331 Gist that combined a month's worth of OOI data to test scaled
compute_Sv
, I noticed the following when runningep.combine_echodata
:ep.combine_echodata
was much longer than the runtime ofep.calibrate.compute_Sv
on the product of the aforementionedep.combine_echodata
All this suggests that the computation may not have been parallelized in this case. I didn't spend too much time looking into this since it was outside of the scope of #1331 so I could be completely wrong about this, but this is probably worth another look.