OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

separate nc4/xarray dataset group creation from file saving functionality #228

Closed emiliom closed 3 years ago

emiliom commented 3 years ago

In the new class-redesign branch, the SetGroup* classes and functionality perform two related but distinct functions: create the xarray dataset for each nc4 group, and save it to file (nc or zarr). Separating those functions would provide more flexibility to power users, without impacting regular users who use Convert followed by the new to_netcdf and to_zarr methods to write to files.

See https://github.com/OSOceanAcoustics/echopype/discussions/225 for more discussions on this.

leewujung commented 3 years ago

I was going to implement this yesterday but thought of a possible issue: in the case when a Dataset is large, by returning the Dataset itself as output from all SetGroup*.set_* methods and save them all into file in, say, a save_all method, it seems that we would be keeping the large Datasets around until saving them. Obviously one way to circumvent this to not have the save_all method but create (using set_*) --> save to file --> destroy the Dataset object.

The majority of data sits in the Beam group that holds the backscatter data. The file size is something the users can choose. The largest file I've seen so far is ~300 MB from EK80.

Thoughts?

emiliom commented 3 years ago

I've already forgotten the details of how this stuff works :disappointed: .... (though I did re-read the associated discussion, #225).

But, my main comment would be to not focus on this issue at this time. There's already plenty to do to get 0.5.0 out the door. This issue can wait.