Closed ctuguinay closed 2 months ago
Hey @leewujung, thanks for the review!
I agree with you on the implementation details. I also think that not all combinations should be allowed (ie the combinations func = nansum
and skipna = False
is a bit contradictory) and I was gonna add some restriction as to the combinations that the user could pass in, but this solution is much cleaner.
Would you also like me to make this same change of exposing skipna
and using func
under the hood to compute_NASC
?
Would you also like me to make this same change of exposing
skipna
and usingfunc
under the hood tocompute_NASC
?
Sounds good if you could add this for compute_NASC
in this PR also. Thanks!
Hey @leewujung, I made the changes, but only the changes for compute_MVBS
.
I do have a few questions on the compute_NASC
part of this PR:
In compute_raw_NASC
, the operation change should only be in the setting of mean/nanmean in the sv_mean = _groupby_x_along_channels(...,func=func="nanmean" if skipna else "mean",...)
right? Not sum/nansum? Since sv_mean
and not sv_sum
is being calculated here. Let me know if this implementation is incorrect, since your comment above said nansum
/sum
.
But, I do also see the sum
operations being used to calculate h_mean_denom
and h_mean_num
and I'm not sure if in those computations I should do anything with nansum
. The same applies with ds_ping_time
and it using nanmean
currently. Should these three computations' aggregate functions interact at all with the skipna
being passed into compute_raw_NASC
?
In
compute_raw_NASC
, the operation change should only be in the setting of mean/nanmean in thesv_mean = _groupby_x_along_channels(...,func=func="nanmean" if skipna else "mean",...)
right? Not sum/nansum? Sincesv_mean
and notsv_sum
is being calculated here. Let me know if this implementation is incorrect, since your comment above saidnansum
/sum
.
Oh right, you're right that it would still be mean/nanmean in the compute_NASC
function the way it is implemented.
But, I do also see the
sum
operations being used to calculateh_mean_denom
andh_mean_num
and I'm not sure if in those computations I should do anything withnansum
. The same applies withds_ping_time
and it usingnanmean
currently. Should these three computations' aggregate functions interact at all with theskipna
being passed intocompute_raw_NASC
?
For h_mean_*
and ds_ping_time
, I think we can hard-code it to use nanmean
/nansum
and skipna=False
, because:
h_mean_*
provides the "height" or the steps in the integral, so NaN comes from the Sv side but not the stepsds_ping_time
is used to come up with a new coordinate along the time or distance dimension, so even if the values end up being NaN, the coordinate should not beSounds good, thanks!
This one is taking me a bit since I am getting some weird NaN values in the compute_NASC
tests when I shouldn't be getting them (in the case where I am using nanmean/nansum). I think if I take a very thorough look at the bins that are being created, that will allow me to make sense of this...I think. This may take me a while longer, so I'll come back to this after addressing a few of the Echopype user issues.
I think if I take a very thorough look at the bins that are being created, that will allow me to make sense of this...I think.
Yeah, and making the test data with smaller dimensions may help with figuring out what is going on. For example computing NASC using just 2 pings of data should give you some flexibility to look into where NaNs should be created and where it should not.
Looking back at that compute MVBS test that I modified, I think I'll revert the change I made for that and have two separate tests (1 for compute MVBS and 1 for compute NASC) with smaller data arrays to do as you suggested above. The _parse_nans
may hide some unintentional NaNs in the ds_MVBS
and it's just so messy working with such a large dataset.
@ctuguinay : I direct-pushed some changes to fix mock_Sv_dataset_irregular
outputs, see below:
compute_Sv
, the Sv
and the echo_range
dataarrays will have identical NaN elements. You can see it in at the end of the compute_Sv
code. The original mock_Sv_dataset_irregular
fixture only set some echo_range
elements to NaN but not the corresponding Sv
elements.test_compute_MVBS_NASC_skipna_nan_and_non_nan_values
to set the NaN elementstest_compute_NASC_values
by adding skipna=True
in the compute_NASC
call, and a few fixtures called in it in conftest.py
and testing.py
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 64.54%. Comparing base (
9f56124
) to head (8e95b99
). Report is 47 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
PR for #1268
In addition, modified
test_compute_MVBS_values
to include a test for adding NaNs to theds_Sv
to illustrate how that would affect the computation ofcompute_MVBS
using bothfunc="nanmean"
andfunc="mean"
.