OSOceanAcoustics / echopop

Generating biological estimates for animal "pop"ulation from echosounder data
https://echopop.readthedocs.io/
Apache License 2.0
1 stars 4 forks source link

Transect subset selection bug #95

Open emiliom opened 1 year ago

emiliom commented 1 year ago

PR #89 overhauled the interim transect subset selection approach initially implemented in v0.1.0-alpha. The intent was to implement the approach found in Matlab EchoPro, as described in #33.

I have found a bug that occurs in some circumstances. The bug emerged when rerunning the bootstrapping_walkthrough.ipynb notebook with the current main. PR #89 introduced a very small change to that notebook, changing the removal_percentage argument value in boot_obj.run_bootstrapping from 50.0 to 60.0. The notebook runs successfully when using 50.0, but boot_obj.run_bootstrapping results in a transect selection error when using 60.0

Currently the notebook transect_selection_workflow.ipynb runs successfully while also applying transect subsetting. But the percentage used is also 50.0%. In addition, there is a transect subsetting test, test_transect_selection.py::test_transect_selection_output, that also runs successfully; rather than a %, it uses a preselected list of transect ids.

The error happens in computation/transect_results.py::set_adult_NASC, specifically in this statement: https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1272-L1274 It's a KeyError (eg, "KeyError: '[6] not in index'") generated when nasc_fraction_adult_df is missing a stratum number found in self.nasc_df.stratum_num. As described in #33, when a subset of transects is requested, the reduced transect set may not contain all possible stratum_num. A scheme is needed to backfill values for the "missing" strata. The scheme introduced in v0.1.0-alpha and described in #33 under "Current solution for missing strata" followed some simple rules Brandon and I devised to fill or interpolate missing strata. PR #89 replaced that strategy with the more involved (and possibly opaque?) scheme used in the Matlab EchoPro code. However, for transect_results.py::set_adult_NASC, it looks like the missing-strata scheme was overlooked. The variable nasc_fraction_adult_df is created based on the strata found in self.bin_ds.len_age_dist_all, then its values are populated by looping over those strata in a for loop. Therefore, when strata are missing in self.bin_ds.len_age_dist_all, they are never backfilled in nasc_fraction_adult_df, which leads to the error. Note: self.bin_ds is an Xarray Dataset that contains stratum_num as a dimension and coordinate variable; self.bin_ds.len_age_dist_all contains that dimension.

This error has nothing to do with bootstrapping per se. But because bootstrapping generates several realizations of a transect subset, it more easily led to the error condition.

emiliom commented 1 year ago

I implemented a fix (PR #99) to handle missing strata that is based on the scheme previously used (ie, before Brandon's changes leading up to v0.3.1) and reuses a variable that was already available, self.stratum_choices. For missing strata, this variable is prepopulated with the neighboring strata (1 or 2, I think) that can be used to populate the missing strata. Here's what I implemented: https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1261-L1267

I also added a test that runs through multiple realizations of transect subset selection: https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/tests/computation/test_transect_selection.py#L13

Missing strata are identified and addressed in other parts of transect_results.py. For example, see this block that calls a few methods to that effect: https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1378-L1381

It's unclear yet whether any of those can be applied in set_adult_NASC.

emiliom commented 1 year ago

The bug itself was addressed in release 0.3.1. Transect subset selection works again. But there is more work to be done to identify and implement the desired scheme for backfilling missing strata. This will be done in release 0.3.2.

emiliom commented 10 months ago

This was partially addressed in release 0.3.1, but more work is needed. I'll retag it as release 0.3.2.

emiliom commented 10 months ago

Investigate relationship to issues brought up in #139