aodn / imos-toolbox

Graphical tool for QC'ing and NetCDF'ing oceanographic datasets
GNU General Public License v3.0
46 stars 31 forks source link

Rdi single ping tools #675

Closed BecCowley closed 3 years ago

BecCowley commented 4 years ago

@ocehugo, the starting point for me processing these files is from 'test/single_PingADCPtest.m'. I work through each cell by hacking into the toolbox import, pre-processing and autoQC tools. You will see there is some plotting routines to assist in determining the thresholds to use. Once I have determined the thresholds to use, I then open the toolbox as normal, and perform the automated QC with the thresholds.

I have edited the GUI to plot temp by default. I generally don't look at the plots for UCUR, VCUR etc as they take to long to view.

Once I've exported to netcdf, I can import the full single ping data and flags. I then perform the bin-averaging and continue with our analysis steps.

I have edited the workhorseParse.m code to allow import of beam coordinate files and transformation to ENU coordinates.

I have added the bin-averaging code, but it needs to be implemented in the toolbox perhaps on export? More discussion may be required. The bin averaging uses a lean interpolation code to speed up the process. And I’ve edited the adcpBinMappingPP.m to use the lean interpolation.

I have added a new surface detection test that combines the echo intensity, side-lobe and a depth test. I don’t use the individual side-lobe, echo intensity tests.

I have added a new echo threshold test which is based on the fish threshold test that RDI use. I am not convinced of how useful it is, but it’s there.

Happy to discuss further to help integration into the toolbox.

ocehugo commented 4 years ago

@BecCowley , I started looking into it, but there is some scattering of functionality here and some scattered changes in commits (for example, I'm reviewing code at a commit that is modified afterwards with some fixes). It would be way more efficient to me if these changes are more isolated commits. Are you comfortable with rebasing?

Otherwise I may need to rebase first to follow up the review with the commits after 4a5749d

BecCowley commented 4 years ago

@ocehugo, I can try rebasing, but I haven't done it before. Can you describe exactly how you want the commits done and I'll have a go.

ocehugo commented 4 years ago

Thanks heaps! The best person to rebase is you, at the moment, since you know what you did! Rebase can be tricky sometimes, so I recommend reading some stuff about it before diving. However, it makes review way simpler and keeps our history in better shape. The point here is that I don't seek a super granular time commit history, but an isolated and feature-wise change history which helps in reviewing/merging/bisecting/blaming.

I can see a lot of small changes required and I will probably need to rebase a bit, add some tests, etc. However, it would help if you could do the following:

  1. The merge commits before https://github.com/aodn/imos-toolbox/pull/675/commits/65487d873699643b7d9f5b748dd0661ee569223e may be squashed into one. This is not strictly necessary (since I can just ignore them) but a good first step into rebasing.

  2. "Featured commits". For example, I would expect, by the commit msg, that https://github.com/aodn/imos-toolbox/pull/675/commits/ebce24a95257792d5fd8b47e179f05689588e882 would only contain imosEchoRangeQC.m changes and directly related changes. Instead, it contains changes to adcpBinmapping and the new interpolation function, which is unrelated. Moreover, adcpBinmapping is changed in following commits, which complicates review. Another example is in the last commit: why should I review test/imosRDiBinAveraging, which is introduced somewhere, but that is removed down the road?

  3. Commits are more sandboxed. Although most of them are new functionality, some are too scattered or include too much stuff. For example, https://github.com/aodn/imos-toolbox/pull/675/commits/ebce24a95257792d5fd8b47e179f05689588e882 is too broad. a. Your changes here for adcpBinMapping could be within a single commit that includes changes introduced in the previous commit on that file. b. A lot of files are being touched, which includes stuff that I don't necessarily need to include (thresholds.txt changes for example). For example, If you don't rebase/edit the commit, I would need to go into that commit and rebase it myself, removing the threshold.txt changes. If these changes are split into different commits, I can clearly just cherry-pick or ignore the commits I need to. In fact, if there are some changes that I don't care (e.g. CSIRO ones), then don't include them (you will need a new branch based on this one though).

  4. Improved commit msgs. If changing several lines in a file to change overall behaviour, just describe why some group of changes were required, particular caveats, etc. This is very helpful since I'm given a clue/mindset before digging through. Also, the commit title is usually short and limited, describing the gist of it, followed by a short msg below (the title is the first line of a commit). They look good but some are incomplete because commits are too broad. The msg below the title can be a bit more descriptive like enumerations (1. fix wrong indexing, 2. load missing required variables, 3. fix performance at interpolation, 4.), etc or a more descriptive thing.

A good rule of thumb is that, in a PR, if a file is changed in more than 1 commit, it's a good candidate for rebasing. This is particularly true if the sequential changes are small features, such as little bugs you found along the way from the first change to the last change. Again, It's all about letting the commits from "history of changes" to "feature changes".

PS:

ocehugo commented 4 years ago

@BecCowley - I don't know if you already started, but if you wish i may do the rebase and we can review together from there.

Also, it would be great to have access to the file you used to test the functionality here and the expected results.

BecCowley commented 4 years ago

@ocehugo, I haven't started yet, it was quite difficult to do with the fork i have, so I was thinking about re-starting with a clean fork and add the changes in the orderly way you describe, rather than trying to rebase. There are only a few changes that I can isolate and add back in. What do you think? I also have some outcomes for further discussion about how we will send the hourly averaged RDI data to AODN. It means some additions to the toolbox. Do you want me to document them in this issue, or start afresh somewhere?

ocehugo commented 4 years ago

@ocehugo, I haven't started yet, it was quite difficult to do with the fork I have, so I was thinking about re-starting with a clean fork and add the changes in the orderly way you describe, rather than trying to rebase. There are only a few changes that I can isolate and add back in. What do you think?

No, this is more work than rebasing. Let me do it and we can go from there. It's a good practice though for future PR's, but indeed requires some mindset/practice, and good tooling (pycharm/vim-fugitive/etc).

I also have some outcomes for further discussion about how we will send the hourly averaged RDI data to AODN. It means some additions to the toolbox. Do you want me to document them in this issue, or start afresh somewhere?

guillaume told me about it but it would be nice if you can open an issue with the requirements. AFAIK one requirement is to export adcp time bin aggregations/avgs, and the other is the actual procedure. It would be nice to be generic enough and allow any variable to be exported (nD arrays, not only adcps, etc). What I have in mind is a just a generic procedure, but with particular procedures matching burst/nonburst, instrument/manufacturer, etc.

Also, Is the new procedure tightly dependent on this PR (e.g. changing a function of this PR)? If so, let me know, I can wait for more commits in here before rebasing (otherwise we will have to solve conflicts later), or we can rebase/review here first so you can start with a proper root. Otherwise, you may just fork from the current master branch and start a clean branch with the functionality.

BecCowley commented 4 years ago

@ocehugo there are no other commits from me to come. In this branch, here is a summary of what is done:

  1. Added a new imosSurfaceDetectionSetQC.m file. I recommend this replace the echo intensity and side lobe tests for those QCing RDI data.
  2. Added a new imosEchoRangeSetQC.m test to attempt to identify fish interference. Follows RDI procedures.
  3. Sped up the interpolation routine using LeanInterp.m (but you might have a better way of doing this).
  4. Attempted to improve the imosEchoIntensitySetQC.m, but I don't think I completed the improvements as I started focussing on the surface detection tests. You may want to ignore these changes.
  5. Improvements to the imosErrorVelocitySetQC.m test to move the acceptable data to the mean +/- the threshold.
  6. Minor changes to the coding of the imosCorrMagVelocitySetQC.m test which may or may not be useful.

These following parts will relate to the discussion we had with Guillaume and they are also included in this PR:

  1. Added imosRDIbinaveragingCSIRO.m code to perform averaging on data that was imported from the IMOS netcdf single ping files.
  2. Added code to change from beam coordinates to ENU coordinates.
  3. Added some code that enabled me to select QC thresholds for the ADCP single ping data (single_pingADCPtest.m). This calls some plotting routines. The work is done outside the toolbox GUI, then the GUI is run to apply the thresholds.

I can start a new issue for the 3 parts above to discuss how they should be implemented, or I can describe it here.

ocehugo commented 4 years ago

@ocehugo there are no other commits from me to come.

Great, I will start rebasing this next week, gotta finish some nortek stuff atm.

1. Added imosRDIbinaveragingCSIRO.m code to perform averaging on data that was imported from the IMOS netcdf single ping files.

Ping me out when you got the first draft of that, I think we can steer this easily to accept any input.

2. Added code to change from beam coordinates to ENU coordinates.

I would highly recommend starting with tests, since this bit may be tricky to evaluate, particularly with double arithmetic.

3. Added some code that enabled me to select QC thresholds for the ADCP single ping data (single_pingADCPtest.m). This calls some plotting routines. The work is done outside the toolbox GUI, then the GUI is run to apply the thresholds.

We may call these "single ping diagnostics", and it could be a menu item at first (part of a new left-side button "Diagnostics" later). I believe you already got some plots from previous mentions, but I would seek some individual functions, or if you feel keen, a plotting class with specific plotting methods (e.g. like previewWindow.m). This way we can easily test them and compose within some UI element if necessary (plus provide interactions too).

I can start a new issue for the 3 parts above to discuss how they should be implemented, or I can describe it here.

Yeap, new feature -> new branch. You can push a PR anytime, just ping me when it's done for review or you want/need some input, etc.

BecCowley commented 4 years ago

@ocehugo, perhaps we can discuss this on a call?

ocehugo commented 3 years ago

@BecCowley - FYI I forgot to close this branch - any change here will need to be rebased to master since the branch was already merged. Also, a lot has been changed. We will need to talk about the binaveraging since I think we will need to improve it a bit more. I also didn't "attached" the binAveraging code to the toolbox at all yet. There are other bits that need changing and some decisions that are still aloft (e.g. probably this should be a PP).

I got some opinions on the way we should do the aggregation too - pretty much avoiding any boxcar smoothing - happy to discuss but in due time, maybe next week!? I got some backlog things and need to prepare some examples for the bin averaging too.

BecCowley commented 3 years ago

OK, more to discuss. I don't think bin averaging should be a pre-processing tool. We can discuss.