OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
99 stars 76 forks source link

sonar_model handling, grouping #995

Open emiliom opened 1 year ago

emiliom commented 1 year ago
Simrad echosounders have a hierachical relationship between individual sonar model strings and the broader processing "class" (used loosely!). Specifically: "Class" Sonar models
EK60 EK60, ES70
EK80 EK80, ES80, EA640

Converting from the sonar model strings to the "class" is a recurring need across the code base that is implemented independently in many different places. Here are two examples:

https://github.com/OSOceanAcoustics/echopype/blob/f8082cac32bfbee2b21c62d266507491d221182d/echopype/convert/set_groups_base.py#L349

https://github.com/OSOceanAcoustics/echopype/blob/f8082cac32bfbee2b21c62d266507491d221182d/echopype/calibrate/range.py#L150-L153

This need to reimplement carries the potential of forgetting to do it; I've run into those oversights a couple of times.

The relationship is already expressed implicitly in core.SONAR_MODELS, eg:

https://github.com/OSOceanAcoustics/echopype/blob/f8082cac32bfbee2b21c62d266507491d221182d/echopype/core.py#L80-L91

We should think about options for handling this more systematically. For example, one option could involve encoding the sonar model "class" (I'm sure there's a better label for this!) in core.SONAR_MODELS, then using that consistently. And/or encoding it as a property of the echodata object, so it's always available. Doing something like that would eliminate a lot of hard-wired listings of individual sonar models in if-then statements.

spacetimeengineer commented 3 months ago

Upon reviewing this issue, it seems the focus is more on finding solutions for enhancing maintenance and upgrading during the development phase rather than on operational improvements (though these are still crucial). There is concern that we might overlook fixing some conditionals when we perform commits. To avoid scattering conditionals throughout the code, might it be beneficial to implement a new class or class function that provides the required answers to these conditionals from the start? Making them unneeded? This approach could streamline the solution process.

leewujung commented 3 months ago

Yes I think a more overarching solution like what you described above would be better. There's also a need to set up a better mechanism for deciding between the Beam_group for calibration target (depending on if user inputs to compute_Sv). It will be great if these can go into the same PR.

jmjech commented 2 months ago

In addition to "sonar_model", there seem to be two other mandatory parameters: "waveform" and "encode" that are required to correctly process the data. Maybe there are others as well? "sonar_model" is embedded in the netCDF4 file with the attributes, and I'm wondering if the other two could/should be embedded in the netCDF4 file as well?

leewujung commented 2 months ago

"waveform" (CW or FM) and "encode" (power or complex) are used to choose which Sonar/Beam_groupX data to process, because there can be different types of data existing in the same file. I think it is better to not make things too automatic (ie users need to choose the right argument), so that there's less confusion about what the code is doing.

jmjech commented 2 months ago

In this case, I argue that automation is better if there is a way to algorithmically determine what those parameters are for each ping. As it is now, the user needs to either know what they are or guess by process of elimination, such as to try different values until the program continues without error. This is inconvenient for a few files, arduous for many files, and may prevent someone from using the data. For users that know the data, it is a pain. For those outside of our community who want to use the data, this could be a showstopper.

If an incorrect waveform or encode value is selected, does the program raise an error and stop or will it continue and apply incorrect methods? The latter would be very bad.

leewujung commented 2 months ago

This is inconvenient for a few files, arduous for many files, and may prevent someone from using the data. For users that know the data, it is a pain. For those outside of our community who want to use the data, this could be a showstopper.

For many files, wouldn't this just take a for loop? For most surveys, files are of the same config. For experiments that config parameters are varied, it is important for users to make conscious choices. I actually think it is much less confusing if people select what they want.

For users who are not already familiar with fisheries data, good documentation would go a long way. This is even more important as more people move to use broadband data, since there's the band average Sv (already implemented) and the Sv spectrum (to be added).

We can change the default parameter depending on the sonar_model, but the argument would surely be what should be default.

If an incorrect waveform or encode value is selected, does the program raise an error and stop or will it continue and apply incorrect methods? The latter would be very bad.

It will error out -- the data simply would not fit the computational operation required, so no way for the code to run through with incorrect method.

jmjech commented 2 months ago

I have a better understanding of what you meant by using those parameters to select the Sonar/Beam group. I glossed over that before. The way to select data that the user wants is an interesting one. The user needs to know what they "want" and then there needs to be an efficient way to get those data.The BI500 did that by generating index files (ping and vlog) and Echoview does that when it creates a .evi file for each data file.

@spacetimeengineer and I are planning to start to go through the workflow together within the next couple of weeks. I told him I should have a decent idea of what is being done but less understanding as to why something is being done. This is a good example of that. I was thinking those parameters had a small role in the overall process, but they have a broader purpose. I'll keep that in mind as we work through the code.

spacetimeengineer commented 1 month ago

Hello, I’ve prepared some code on my fork and re-based it with the latest upstream , but I haven’t submitted the PR yet as my test environment isn’t fully ready. However, if you want to review the changes, you can check out my issue #995 branch. The commit is quite substantial since it touches multiple areas of the code, so I wanted to give you a heads-up on my approach.

For every instance where sonar_model was being checked against model lists for device-specific routines, I replaced it with a single constant command. This eliminates the need to modify the code in those places, as it now applies uniformly to all sonar models. However, device-specific functions still run on a per-device basis, as the SONAR_MODELS dictionary now includes new keys for each model. The values to these keys reference sonar-specific functions, where I’ve wrapped your existing code into functions. These functions live in the core.py script alongside the SONAR_MODELS. This follows the suggestion made by @emiliom.

For devices where no specific action is needed, I’ve implemented empty lambda functions that follow the same parameter schema.

I should mention that there are additional changes on my branch that originally aimed to address issue #995, but they’ve expanded beyond the initial scope. However, I believe this new feature is valuable enough to keep (though it’s still unfinished until I can correct it for other models beyond the EK60 and EK80, which are complete once they pass the checks).

The feature: I found a way to automatically parse the sonar_model from the raw data using open raw, by checking the data right at the outset. I’ve set the sonar_model argument to None by default, making it optional. For EK60 and EK80 raw data, if a sonar_model argument is provided, it’s ignored in favor of what’s available in the raw data. For other models, we still require the second argument, but only until I finish the parsing code for those models as well. I’ve mostly repurposed code that the devs already wrote.

As I mentioned, my testing environment is still incomplete, but once I have the necessary test data, I should be able to pass the checks.

leewujung commented 3 weeks ago

The feature: I found a way to automatically parse the sonar_model from the raw data using open raw, by checking the data right at the outset. I’ve set the sonar_model argument to None by default, making it optional.

This is related to #494 for EK60/EK80, so I'll assign you to that too, so that we can close that once #1399 is merged.