aodn / imos-toolbox

Graphical tool for QC'ing and NetCDF'ing oceanographic datasets
GNU General Public License v3.0
46 stars 31 forks source link

675 rebase #683

Closed ocehugo closed 3 years ago

ocehugo commented 4 years ago

@BecCowley - this is the bare bones I want to develop on.

What I did:

a. Dropped some commits with templating and local options that are not required. b. Squashed some changes in certain files c. Split some big commits into several, so we have some functionality closure. d. rewording commits e. Reordered (for further rebase/fixups if necessary).

I put the order of the commits in roughly the order I want to address things, so it's easy to rebase further if required. However, what I need now is to reach a self-contained test and reproduce the results. After that, I can go and do further cleanups/changes/etc.

Testing:

  1. I can't properly execute your test with the files you sent me. The file 17959001.000 is too big and too slow to load. After reducing the binary file I could import it but then you call the UI for selecting the pre-processing, and I'm not sure which ones you use (since toolboxproperties changes a lot). Could you confirm if the toolboxproperties you sent on the previous PR is the one you use for the tests? Otherwise, just a list of the PP used is fine for me.

  2. As well, I just need some basic metadata from 17959001.000 file (site depth, nominal depths and such). Would you please send me the sample_data struct so I can copy the metadata? This will also help in a self-contained test and no need for dbs.

  3. The same applied for the thresholds, Which one you selected for this file? We will improve the threshold plots later, all i want is to have a robust auto-test so I can start the cleanup process. The same applied to some parameters you define in SurfaceDetectionSet (the PR doesn't have one, and we need static values for the test).

  4. I kept the Geomag changes, but are they required for 17959001.000? Any other test case will require these fields?

Thanks

BecCowley commented 4 years ago

@ocehugo

  1. The files are huge, that is part of the problem! I have to run all the processing on a unix box and that is very slow. I don't think it is possible to run on a standard desktop PC.Yes, the toolboxproperties file is what I am using. The PP tests I use are timeDrift, DepthPP and magnetic declination. I am happy to go through the process of how I use the single_pingADCPtest.m code if you want. I call the autoQC steps one at a time to determine what the best thresholds are, so I run that section of the code at least once for each threshold test. It also requires deleting the .pqc file each time so that the values I select are used, not the ones in the .pqc file.

Yes, the toolboxproperties file is what I am using. The PP tests I use are timeDrift, DepthPP and magnetic declination. I am happy to go through the process of how I use the single_pingADCPtest.m code if you want.

I call the autoQC steps one at a time to determine what the best thresholds are, so I run that section of the code at least once for each threshold test. It also requires deleting the .pqc file each time so that the values I select are used, not the ones in the .pqc file.

  1. I will send the sample_data struct. If you want the original csv files ( I don't use a database), I can send that too.
  2. Thresholds for this file. Echo Intensity and SideLobe thresholds are used as part of the 'SurfaceDetection' test I wrote. I don't use the individual EchoIntensity and SideLobe tests.
Echo Range threshold Echo Intensity threshold SideLobe threshold Cmag threshold Error Velocity threshold HV threshold VV threshold Tilt flag 2 Tilt flag 4
55 30 0.5 100 0.4 2.5 1.2 40 60
  1. I'm not sure what changes I made to the geomag, except maybe they are the user settings that are recorded when the routine is called?
ocehugo commented 4 years ago

@BecCowley

  1. The files are huge, that is part of the problem! I have to run all the processing on a unix box and that is very slow. I don't think it is possible to run on a standard desktop PC.

The file format clearly wastes a lot of space, but the actual problem is that the parser is too slow and memory hungry. Since I need something that will run in under seconds from raw file to the end of parsing (test), I reduced the binary to the first 100mb, which creates a sample_data with roughly 50mb of valid data, if my quick inspection was correct. See the file here

I'm just not sure if this file will suffice for all the PR/workflow checks - let me know - but the above file is what I got in mind to use.

Yes, the toolboxproperties file is what I am using. The PP tests I use are timeDrift, DepthPP and magnetic declination. I am happy to go through the process of how I use the single_pingADCPtest.m code if you want.

Great. It's a good idea to see you doing it so I can change the current test from interactive to explicit code/function calls. Ping me over email with the time you wanna do it, I'm quite free today/this week.

I call the autoQC steps one at a time to determine what the best thresholds are, so I run that section of the code at least once for each threshold test. It also requires deleting the .pqc file each time so that the values I select are used, not the ones in the .pqc file.

OK. By calling explicitly the routines we avoid the pqc problem. I think I will understand better once you demoed to me. Seems to me that we will need a primary QC routine that will execute secondary ones to aid in the decision, but this is for later anyway.

  1. I will send the sample_data struct. If you want the original csv files ( I don't use a database), I can send that too.
  2. Thresholds for this file. Echo Intensity and SideLobe thresholds are used as part of the 'SurfaceDetection' test I wrote. I don't use the individual EchoIntensity and SideLobe tests. Echo Range threshold Echo Intensity threshold SideLobe threshold Cmag threshold Error Velocity threshold HV threshold VV threshold Tilt flag 2 Tilt flag 4
    55 30 0.5 100 0.4 2.5 1.2 40 60

Thanks!

  1. I'm not sure what changes I made to the geomag, except maybe they are the user settings that are recorded when the routine is called?

OK, it's just dynamic IO then and we don't care about it. Will drop the commit when the test is refactored.

BecCowley commented 4 years ago

@ocehugo I loaded the reduced size file and it tested fine. Just under half of it is out-of-water data. If you are happy with that, then it's good to test on.