OSOceanAcoustics / echoregions

Interfacing water column sonar data with annotations and labels
https://echoregions.readthedocs.io/
Apache License 2.0
6 stars 5 forks source link

Disentangle Nested If Else in Select Region and Add Region Class Selections #160

Closed ctuguinay closed 6 months ago

ctuguinay commented 7 months ago

This addresses #131 and #140

codecov[bot] commented 7 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (0a76630) 87.43% compared to head (8a8b331) 87.80%. Report is 5 commits behind head on main.

Files Patch % Lines
echoregions/regions2d/regions2d.py 97.43% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #160 +/- ## ========================================== + Coverage 87.43% 87.80% +0.36% ========================================== Files 13 13 Lines 573 582 +9 ========================================== + Hits 501 511 +10 + Misses 72 71 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ctuguinay commented 7 months ago

@leewujung Thanks for the comments! I just have a few questions before I start working on this again.

Right now there is no check for the case when region_id and region_class both exist, and the code would break if both are given as inputs. How about adding a check to raise an error on the case where both exist and the case where both are None? i.e., we require only one of them given as input. The docstring should say something about this too.

Why exactly would the code break for both region_id and region_class? The operations for checking and subsetting region_id and region_class are independent of one another https://github.com/OSOceanAcoustics/echoregions/blob/83781bcc575620562faf3391bb1d39bd309142af/echoregions/regions2d/regions2d.py#L116C1-L150C71.

I also think being able to combine both of these as selection parameters should still be the desired behavior. I can't see a reason why we, in particular, would need both region_id and region_class to be non-NaN, but I don't necessarily think that this type of selection should be restricted since we do allow the other selections to be combined, i.e. region_id, depth_range, and time_range.

leewujung commented 7 months ago

I also think being able to combine both of these as selection parameters should still be the desired behavior. I can't see a reason why we, in particular, would need both region_id and region_class to be non-NaN, but I don't necessarily think that this type of selection should be restricted since we do allow the other selections to be combined, i.e. region_id, depth_range, and time_range.

My rationale for restricting the selection to be only region_id and region_class is just to avoid confusion and user error in the form of unintended additional filtering of the dataframe. I think that if people want to select only some region_id that are within a particular region_class, they really should only pass the specific region_id they want (i.e. doing the selection outside of this function), instead of passing in the list of region_id and expect region_class to take care of the additional filtering to remove IDs that are not in a particular region_class.

Also, since filtering in the code section you highlighted are sequential, they are not independent -- region_class here is actually an additional filtering criteria after self.data has been filtered with the region_id input argument.

ctuguinay commented 6 months ago

@leewujung I added all your previous suggestions. This is ready to be reviewed again.