Open emiliom opened 1 year ago
I implemented a fix (PR #99) to handle missing strata that is based on the scheme previously used (ie, before Brandon's changes leading up to v0.3.1) and reuses a variable that was already available, self.stratum_choices
. For missing strata, this variable is prepopulated with the neighboring strata (1 or 2, I think) that can be used to populate the missing strata. Here's what I implemented:
https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1261-L1267
I also added a test that runs through multiple realizations of transect subset selection: https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/tests/computation/test_transect_selection.py#L13
Missing strata are identified and addressed in other parts of transect_results.py
. For example, see this block that calls a few methods to that effect:
https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1378-L1381
It's unclear yet whether any of those can be applied in set_adult_NASC
.
The bug itself was addressed in release 0.3.1. Transect subset selection works again. But there is more work to be done to identify and implement the desired scheme for backfilling missing strata. This will be done in release 0.3.2.
This was partially addressed in release 0.3.1, but more work is needed. I'll retag it as release 0.3.2.
Investigate relationship to issues brought up in #139
PR #89 overhauled the interim transect subset selection approach initially implemented in v0.1.0-alpha. The intent was to implement the approach found in Matlab EchoPro, as described in #33.
I have found a bug that occurs in some circumstances. The bug emerged when rerunning the bootstrapping_walkthrough.ipynb notebook with the current
main
. PR #89 introduced a very small change to that notebook, changing theremoval_percentage
argument value inboot_obj.run_bootstrapping
from 50.0 to 60.0. The notebook runs successfully when using 50.0, butboot_obj.run_bootstrapping
results in a transect selection error when using 60.0Currently the notebook transect_selection_workflow.ipynb runs successfully while also applying transect subsetting. But the percentage used is also 50.0%. In addition, there is a transect subsetting test, test_transect_selection.py::test_transect_selection_output, that also runs successfully; rather than a %, it uses a preselected list of transect ids.
The error happens in
computation/transect_results.py::set_adult_NASC
, specifically in this statement: https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1272-L1274 It's aKeyError
(eg, "KeyError: '[6] not in index'") generated whennasc_fraction_adult_df
is missing a stratum number found inself.nasc_df.stratum_num
. As described in #33, when a subset of transects is requested, the reduced transect set may not contain all possiblestratum_num
. A scheme is needed to backfill values for the "missing" strata. The scheme introduced in v0.1.0-alpha and described in #33 under "Current solution for missing strata" followed some simple rules Brandon and I devised to fill or interpolate missing strata. PR #89 replaced that strategy with the more involved (and possibly opaque?) scheme used in the Matlab EchoPro code. However, fortransect_results.py::set_adult_NASC
, it looks like the missing-strata scheme was overlooked. The variablenasc_fraction_adult_df
is created based on the strata found inself.bin_ds.len_age_dist_all
, then its values are populated by looping over those strata in a for loop. Therefore, when strata are missing inself.bin_ds.len_age_dist_all
, they are never backfilled innasc_fraction_adult_df
, which leads to the error. Note:self.bin_ds
is an Xarray Dataset that containsstratum_num
as a dimension and coordinate variable;self.bin_ds.len_age_dist_all
contains that dimension.This error has nothing to do with bootstrapping per se. But because bootstrapping generates several realizations of a transect subset, it more easily led to the error condition.