OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

refactor(convert): Full refactor for the usage of disk during memory expansion [all tests ci] #1185

Closed lsetiawan closed 9 months ago

lsetiawan commented 11 months ago

Overview

This PR does a full refactor for the "swap" functionality by largely removing the whole usage of Parsed2Zarr objects in favor of a simpler direct usage of zarr library features to create arrays.

Issues Resolved

codecov-commenter commented 11 months ago

Codecov Report

Merging #1185 (bf895a1) into dev (f98e825) will increase coverage by 5.32%. The diff coverage is 88.46%.

@@            Coverage Diff             @@
##              dev    #1185      +/-   ##
==========================================
+ Coverage   77.80%   83.13%   +5.32%     
==========================================
  Files          66       63       -3     
  Lines        6002     5710     -292     
==========================================
+ Hits         4670     4747      +77     
+ Misses       1332      963     -369     
Flag Coverage Δ
unittests 83.13% <88.46%> (+5.32%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
echopype/convert/api.py 88.70% <100.00%> (+2.69%) :arrow_up:
echopype/convert/parse_ek60.py 70.00% <100.00%> (ø)
echopype/convert/parse_ek80.py 50.00% <100.00%> (ø)
echopype/convert/set_groups_base.py 82.35% <100.00%> (-0.15%) :arrow_down:
echopype/convert/set_groups_ek60.py 100.00% <100.00%> (+6.45%) :arrow_up:
echopype/convert/set_groups_ek80.py 98.33% <100.00%> (+12.11%) :arrow_up:
echopype/convert/utils/ek_swap.py 100.00% <100.00%> (ø)
echopype/core.py 85.18% <ø> (-1.03%) :arrow_down:
echopype/testing.py 95.45% <95.29%> (-4.55%) :arrow_down:
echopype/echodata/echodata.py 77.49% <62.50%> (-0.40%) :arrow_down:
... and 2 more

... and 1 file with indirect coverage changes

:mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

lsetiawan commented 11 months ago

Note to self: Look into the potential use of context manager for open_raw? https://realpython.com/python-with-statement/

Too complex at this moment... tabling this.

lsetiawan commented 10 months ago

@leewujung I got all the unit test in that uses your mock data. It took me a bit to make sure that I'm taking consideration various things such as the shape and values in there. I was able to get the old integration test in there though I found some issues with one of the EK80 sample 😞. Also, I need to bring back the s3 side of testing... I might need one more day to get those in. But please do start reviewing when you get a chance to at least get familiar with how it is and see if anything need to change or confusing. Thanks.

lsetiawan commented 10 months ago

Also, I need to bring back the s3 side of testing... I might need one more day to get those in.

Thinking about this... I feel maybe for a "swap" it would be less complexity and just keep it on disk... since this is really just for temporary space when opening a file and it's not that large. We know that zarr write to s3 can be very slow, especially when it's such a small file, I am not sure if it's even worth it. Let me know what you think. Thanks.

leewujung commented 10 months ago

But please do start reviewing when you get a chance to at least get familiar with how it is and see if anything need to change or confusing. We know that zarr write to s3 can be very slow, especially when it's such a small file, I am not sure if it's even worth it.

Let's connect Monday afternoon to go through the approach you are taking and the s3 consideration you have -- I think that's going to be more efficient than going back and forth here. Thanks.

lsetiawan commented 9 months ago

@leewujung Thanks for this extensive review! I'm glad that the review went well and that you're satisfied with the changes 😄

Regarding your last comment:

My only significant comment is the one related to the reshaping of complex variables.

Would you be able to create a new issue for this so that it can be tracked? Thanks!

lsetiawan commented 9 months ago

Would you be able to create a new issue for this so that it can be tracked? Thanks!

Nvm, just quickly did this. See https://github.com/OSOceanAcoustics/echopype/issues/1213