OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
89 stars 70 forks source link

expose func arg and skip_na to compute_MVBS users #1269

Closed ctuguinay closed 2 months ago

ctuguinay commented 4 months ago

PR for #1268

In addition, modified test_compute_MVBS_values to include a test for adding NaNs to the ds_Sv to illustrate how that would affect the computation of compute_MVBS using both func="nanmean" and func="mean".

ctuguinay commented 3 months ago

Hey @leewujung, thanks for the review!

I agree with you on the implementation details. I also think that not all combinations should be allowed (ie the combinations func = nansum and skipna = False is a bit contradictory) and I was gonna add some restriction as to the combinations that the user could pass in, but this solution is much cleaner.

Would you also like me to make this same change of exposing skipna and using func under the hood to compute_NASC?

leewujung commented 3 months ago

Would you also like me to make this same change of exposing skipna and using func under the hood to compute_NASC?

Sounds good if you could add this for compute_NASC in this PR also. Thanks!

ctuguinay commented 3 months ago

Hey @leewujung, I made the changes, but only the changes for compute_MVBS.

I do have a few questions on the compute_NASC part of this PR:

In compute_raw_NASC, the operation change should only be in the setting of mean/nanmean in the sv_mean = _groupby_x_along_channels(...,func=func="nanmean" if skipna else "mean",...) right? Not sum/nansum? Since sv_mean and not sv_sum is being calculated here. Let me know if this implementation is incorrect, since your comment above said nansum /sum.

But, I do also see the sum operations being used to calculate h_mean_denom and h_mean_num and I'm not sure if in those computations I should do anything with nansum. The same applies with ds_ping_time and it using nanmean currently. Should these three computations' aggregate functions interact at all with the skipna being passed into compute_raw_NASC?

leewujung commented 3 months ago

In compute_raw_NASC, the operation change should only be in the setting of mean/nanmean in the sv_mean = _groupby_x_along_channels(...,func=func="nanmean" if skipna else "mean",...) right? Not sum/nansum? Since sv_mean and not sv_sum is being calculated here. Let me know if this implementation is incorrect, since your comment above said nansum /sum.

Oh right, you're right that it would still be mean/nanmean in the compute_NASC function the way it is implemented.

But, I do also see the sum operations being used to calculate h_mean_denom and h_mean_num and I'm not sure if in those computations I should do anything with nansum. The same applies with ds_ping_time and it using nanmean currently. Should these three computations' aggregate functions interact at all with the skipna being passed into compute_raw_NASC?

For h_mean_* and ds_ping_time, I think we can hard-code it to use nanmean/nansum and skipna=False, because:

ctuguinay commented 3 months ago

Sounds good, thanks!

ctuguinay commented 3 months ago

This one is taking me a bit since I am getting some weird NaN values in the compute_NASC tests when I shouldn't be getting them (in the case where I am using nanmean/nansum). I think if I take a very thorough look at the bins that are being created, that will allow me to make sense of this...I think. This may take me a while longer, so I'll come back to this after addressing a few of the Echopype user issues.

leewujung commented 3 months ago

I think if I take a very thorough look at the bins that are being created, that will allow me to make sense of this...I think.

Yeah, and making the test data with smaller dimensions may help with figuring out what is going on. For example computing NASC using just 2 pings of data should give you some flexibility to look into where NaNs should be created and where it should not.

ctuguinay commented 2 months ago

Looking back at that compute MVBS test that I modified, I think I'll revert the change I made for that and have two separate tests (1 for compute MVBS and 1 for compute NASC) with smaller data arrays to do as you suggested above. The _parse_nans may hide some unintentional NaNs in the ds_MVBS and it's just so messy working with such a large dataset.

leewujung commented 2 months ago

@ctuguinay : I direct-pushed some changes to fix mock_Sv_dataset_irregular outputs, see below:

codecov-commenter commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 64.54%. Comparing base (9f56124) to head (8e95b99). Report is 47 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1269 +/- ## =========================================== - Coverage 83.52% 64.54% -18.99% =========================================== Files 64 14 -50 Lines 5686 925 -4761 =========================================== - Hits 4749 597 -4152 + Misses 937 328 -609 ``` | [Flag](https://app.codecov.io/gh/OSOceanAcoustics/echopype/pull/1269/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OSOceanAcoustics) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/OSOceanAcoustics/echopype/pull/1269/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OSOceanAcoustics) | `64.54% <100.00%> (-18.99%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=OSOceanAcoustics#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.