Closed gavinmacaulay closed 2 years ago
@gavinmacaulay : I looked into this and think the calculation itself is correct. The actual range vector (in meters) in the computed Sv dataset is stored in the variable range
, and it is calculated using echodata.compute_range() under the hood. When range
is used in conjunction with the computed Sv data, only the first 8284 range samples will be used, since there is no data after that for the 70kHz channel (NaN-padded dataarray).
The number of samples in your Sv_70k_range
is based on the NaN-padded dataarray, so have 25228 samples, just like Sv_200k_range
. The difference is in the NaN-padded, actual Sv variable.
That said, I can see that we can avoid this confusion by setting all samples in range
that corresponds to the NaN samples in the 70k Sv to also be NaN, so that even though the length of the Sv_70k_range
length is still 25228, the values it contains will ends at the 8284th sample.
Thoughts?
Another subtlety here is that each ping may not have the same number of non-NaN samples, even though for most datasets the max range (before the first NaN sample) is the same. This is why the output of echodata.compute_range() (or ds_Sv.range) has the same dimensions.
The EK80 test file, D20170912-T234910.raw, has two channels (70 and 200 kHz) with different sample rates (about a x3 difference). When using the echodata.compute_range() function, the returned range vector for the 70 kHz channel is incorrect - it has the same number of range values as the 200 kHz channel, when it should have about a third less. Due to how the range vector is calculated, the actual ranges in the 70 kHz vector are, per-sample, a third of what they should be. The dropna() method used when extracting the Sv at 70 kHz doesn't work for the 70 kHz range vector due to this.
Code, derived from test_calibrate.py, that shows this is: