Open bkatiemills opened 8 years ago
Good idea! I'm pretty busy here so don't know if I'll get chance in the near future, so it might be best for you to do it if you are happy with that.
No problem! I want to finish the track check first (getting there finally), then this.
Hi @BillMills, I was thinking we should put together a list of things we want to achieve before May. I thought of:
I'll need to do the first one of course and the second should be in hand. I think the fourth one would be difficult to implement but could be possible with help from the CORA team. We should probably do everything else and then see if we have time to do it. How are the CSIRO tests going? Is there anything I can do to help?
Hi @s-good - sounds like a plan to me! Also on the topic of running the full stack, I've got a test run going on AWS as we speak, but there are a couple issues:
One of the ICDC tests requires a new data file which is in the docker image but the rest should run as they are. Do you have the error message?
If it is going to cost that much on AWS I agree we should look into other options. I can try to run it at my work - it may take some time to get it all run but I think it should be possible.
Also, I think if it is running very slow we should not attempt the CORA test as I think it would be very resource heavy to run. We could consider, as a one off, asking them to run the profiles through their system and return the results to us. If it turns out to be very good we could implement it at a later date.
A few updates:
EN_increasing_depth
check, however, sometimes hangs for a very long time (minutes to possibly indefinitely) - this is next on my list to investigate. In 2-3 hours on AWS c3large:
Number of profiles tested was 12787
NAME OF TEST FAILS TPR FPR TNR FNR
Argo_global_range_check 102 16.0% 0.0% 100.0% 84.0%
Argo_gradient_test 186 27.4% 0.1% 99.9% 72.6%
Argo_impossible_date_test 0 0.0% 0.0% 100.0% 100.0%
Argo_impossible_location_test 0 0.0% 0.0% 100.0% 100.0%
Argo_pressure_increasing_test 1696 11.6% 13.4% 86.6% 88.4%
Argo_regional_range_test 5 0.6% 0.0% 100.0% 99.4%
Argo_spike_test 19 3.0% 0.0% 100.0% 97.0%
CSIRO_depth 9473 83.7% 73.6% 26.4% 16.3%
CSIRO_gradient 2036 43.8% 14.5% 85.5% 56.2%
CSIRO_wire_break 161 25.2% 0.0% 100.0% 74.8%
CoTeDe_Argo_density_inversion 6028 34.9% 47.8% 52.2% 65.1%
CoTeDe_GTSPP_WOA_normbias 2953 85.0% 19.8% 80.2% 15.0%
CoTeDe_GTSPP_global_range 103 16.1% 0.0% 100.0% 83.9%
CoTeDe_GTSPP_gradient 123 17.5% 0.1% 99.9% 82.5%
CoTeDe_GTSPP_spike_check 24 3.3% 0.0% 100.0% 96.7%
CoTeDe_RoC 174 24.9% 0.1% 99.9% 75.1%
CoTeDe_WOA_normbias 1328 61.2% 7.7% 92.3% 38.8%
CoTeDe_anomaly_detection 2180 51.3% 15.2% 84.8% 48.7%
CoTeDe_digit_roll_over 294 28.6% 0.9% 99.1% 71.4%
CoTeDe_fuzzy_logic 1826 74.5% 11.1% 88.9% 25.5%
CoTeDe_gradient 131 18.8% 0.1% 99.9% 81.2%
CoTeDe_location_at_sea_test 6 0.5% 0.0% 100.0% 99.5%
CoTeDe_profile_envelop 185 29.0% 0.0% 100.0% 71.0%
CoTeDe_spike 14 2.2% 0.0% 100.0% 97.8%
CoTeDe_tukey53H_norm 68 6.6% 0.2% 99.8% 93.4%
EN_background_available_check 15 1.9% 0.0% 100.0% 98.1%
EN_background_check 931 43.8% 5.4% 94.6% 56.2%
EN_constant_value_check 1 0.2% 0.0% 100.0% 99.8%
EN_range_check 102 16.0% 0.0% 100.0% 84.0%
EN_spike_and_step_check 17 2.5% 0.0% 100.0% 97.5%
EN_spike_and_step_suspect 210 29.1% 0.2% 99.8% 70.9%
EN_stability_check 28 1.3% 0.2% 99.8% 98.7%
WOD_gradient_check 266 30.0% 0.6% 99.4% 70.0%
WOD_range_check 192 29.6% 0.0% 100.0% 70.4%
loose_location_at_sea 2 0.3% 0.0% 100.0% 99.7%
RESULT OF OR OF ALL: 11395 98.4% 88.6% 11.4% 1.6%
Taking all those that had TPR > FNR, minus CSIRO_depth
for its large FPR, and in about 30 minutes:
Number of profiles tested was 12787
NAME OF TEST FAILS TPR FPR TNR FNR
CoTeDe_GTSPP_WOA_normbias 2953 85.0% 19.8% 80.2% 15.0%
CoTeDe_WOA_normbias 1328 61.2% 7.7% 92.3% 38.8%
CoTeDe_anomaly_detection 2180 51.3% 15.2% 84.8% 48.7%
CoTeDe_fuzzy_logic 1826 74.5% 11.1% 88.9% 25.5%
RESULT OF OR OF ALL: 4363 90.5% 31.2% 68.8% 9.5%
Or instead, taking all those with FPR < 1% and in <1h:
Number of profiles tested was 12787
NAME OF TEST FAILS TPR FPR TNR FNR
Argo_global_range_check 102 16.0% 0.0% 100.0% 84.0%
Argo_gradient_test 186 27.4% 0.1% 99.9% 72.6%
Argo_regional_range_test 5 0.6% 0.0% 100.0% 99.4%
Argo_spike_test 19 3.0% 0.0% 100.0% 97.0%
CSIRO_wire_break 161 25.2% 0.0% 100.0% 74.8%
CoTeDe_GTSPP_global_range 103 16.1% 0.0% 100.0% 83.9%
CoTeDe_GTSPP_gradient 123 17.5% 0.1% 99.9% 82.5%
CoTeDe_GTSPP_spike_check 24 3.3% 0.0% 100.0% 96.7%
CoTeDe_RoC 174 24.9% 0.1% 99.9% 75.1%
CoTeDe_digit_roll_over 294 28.6% 0.9% 99.1% 71.4%
CoTeDe_gradient 131 18.8% 0.1% 99.9% 81.2%
CoTeDe_location_at_sea_test 6 0.5% 0.0% 100.0% 99.5%
CoTeDe_profile_envelop 185 29.0% 0.0% 100.0% 71.0%
CoTeDe_spike 14 2.2% 0.0% 100.0% 97.8%
CoTeDe_tukey53H_norm 68 6.6% 0.2% 99.8% 93.4%
EN_constant_value_check 1 0.2% 0.0% 100.0% 99.8%
EN_range_check 102 16.0% 0.0% 100.0% 84.0%
EN_spike_and_step_check 17 2.5% 0.0% 100.0% 97.5%
EN_spike_and_step_suspect 210 29.1% 0.2% 99.8% 70.9%
EN_stability_check 28 1.3% 0.2% 99.8% 98.7%
WOD_gradient_check 266 30.0% 0.6% 99.4% 70.0%
WOD_range_check 192 29.6% 0.0% 100.0% 70.4%
loose_location_at_sea 2 0.3% 0.0% 100.0% 99.7%
RESULT OF OR OF ALL: 474 36.9% 2.0% 98.0% 63.1%
As I think you can tell, this is me wandering through combinations fairly ad hoc - but this is looking quite strong compared to Hamburg, thanks to the CoTeDe work. More soon.
This is fantastic! Thanks for doing this.
I will also have a go at running it at work - I think if I break it up into small enough parallel chunks and save the output as I go rather than at the end it may work and it saves the money, even if it is only a small amount each time. I'll let you know how I get on.
Now that we're mostly just crossing t's on #130, one thing that strikes me as pressing for May is making a final decision about what set of tests to use. @s-good, let me know if you have something in mind; otherwise, I will turn my attention to this next.
We should set some performance goals, which will likely look something like "minimum false positives while demanding < n% false negatives;" of course we want n = 0, but that's more of an asymptotic goal. The experts will need to set n for IQuOD 1.0, but if forced to do it myself, I would say something like n<5% this year, and shoot for n<2% next (if we plan on updating things in future, which is what I understood from Hamburg). But, not sure what numbers will be compelling for the community - let me know and we'll make it work.
While hammering on that last 10% of bad profiles not getting flagged (absolutely baroque details here), I noticed that a whole lot of profiles in quota which report 0 for the depth of every level are not considered truly 'flagged' by AutoQC's referenceResults
, which only considers the temperature qc t_level_qc
. @s-good, is this what we want? Missing depth information seems (from my naive perspective) like a pretty serious problem that perhaps should affect how we judge whether a profile should have been flagged or not.
It sounds like we might need to be checking the depth QC flags as well as the temperature flags. Does it help to change line 81 in main.py from:
refAssessment = profile.t_level_qc(originator=True) >= 3
to:
refAssessment = profile.t_level_qc(originator=True) >= 3 | profile.z_level_qc(originator=True) >= 3
To answer your previous question, it would be good to try to find the best combination for a few cases e.g. lowest FPR if TPR > 90%, 95% and 98%. It might be interesting to also find highest TPR if FPR < 5% but I think that is a much lower priority.
It seems like we are pretty close to where we want to be with getting the QC checks done before running AutoQC on the different test datasets we have identified? If so I will ask about getting the test datasets extracted/converted into the WOD ASCII format.
refAssessment
definition as you describe; as long as it's understood that this definition will have a very large impact on all qc performance measures and optimization. Before I go ahead and start that, can we foresee any other updates needed to the definition of what should be considered 'bad'?class_weight
used in the SVM learning algorithm; I am currently weighting true flags very heavily since they are our priority, but it sounds like it might be interesting to plot TPR etc as a function of this weight.I'm a bit concerned by these profiles with depth of all 0 since they sound unusual and they are going to distort our statistics. @BecCowley, @BoyerWOD can you give any advice on profiles that @BillMills has found in QuOTA with depths of all 0? Are these real profiles in QuOTA or could this be an error in the translation to WOD format?
@BoyerWOD please could you also advise on which QC flags we need to read from the WOD ASCII files in order to get the QuOTA QC decisions? We are currently reading the originators QC flags attached to the temperature values. Do we also need to use the originator QC flags attached to the depths + anything else?
I have just looked at one of the profiles in question (88213533). It is a 1995 PALACE float (precursor to the Argo floats). The file in Quota has the depths (107 of them), so it appears to be a translation error to the WOD format. @BoyerWOD will need to investigate. Perhaps send him a list of the unique IDs will be most helpful.
Also, 754413 is a CTD (a low resolution version) and a duplicate of another profile. The depths also exist in the original file.
Have we considered creating a reading function for the original file format? Could be useful in the future for XBT data out of CSIRO.
Thanks @BecCowley. I think I've worked out what is going on now. I've listed the 88213533 profile using the Fortran code supplied on the WOD website and get this:
CC cruise Latitde Longitde YYYY MM DD Time Cast #levels 99 0 -34.620 92.310 1995 6 14 14.40 88213533 107
Number of variables in this cast: 2
z fo 1 fo 25 fo
0.0 01 15.482 (5) 01 3.000 (1) 01
0.0 11 15.482 (5) 01 9.000 (1) 01
0.0 11 15.482 (5) 01 15.000 (2) 01
0.0 11 15.482 (5) 01 21.000 (2) 01
0.0 11 15.482 (5) 01 27.000 (2) 01
0.0 11 15.482 (5) 01 33.000 (2) 01
0.0 11 15.433 (5) 01 39.000 (2) 01
0.0 11 15.383 (5) 01 45.000 (2) 01
0.0 11 15.383 (5) 01 51.000 (2) 01
0.0 11 15.383 (5) 01 57.000 (2) 01
0.0 11 15.383 (5) 01 63.000 (2) 01
0.0 11 15.383 (5) 01 69.000 (2) 01
0.0 11 15.383 (5) 01 75.000 (2) 01
0.0 11 15.383 (5) 01 81.000 (2) 01
0.0 11 15.309 (5) 01 87.000 (2) 01
0.0 11 15.211 (5) 01 93.000 (2) 01
0.0 11 14.331 (5) 01 99.000 (2) 01
0.0 11 13.512 (5) 01 105.000 (3) 01
0.0 11 13.179 (5) 01 111.000 (3) 01
0.0 11 13.203 (5) 01 117.000 (3) 01
0.0 11 13.155 (5) 01 123.000 (3) 01
0.0 11 12.989 (5) 01 129.000 (3) 01
0.0 11 12.847 (5) 01 135.000 (3) 01
0.0 11 12.706 (5) 01 141.000 (3) 01
0.0 11 12.612 (5) 01 147.000 (3) 01
0.0 11 12.517 (5) 01 153.000 (3) 01
0.0 11 12.447 (5) 01 159.000 (3) 01
0.0 11 12.353 (5) 01 165.000 (3) 01
0.0 11 12.213 (5) 01 172.000 (3) 01
0.0 11 12.119 (5) 01 178.000 (3) 01
0.0 11 12.026 (5) 01 184.000 (3) 01
0.0 11 11.956 (5) 01 190.000 (3) 01
0.0 11 11.933 (5) 01 196.000 (3) 01
0.0 11 11.910 (4) 01 202.000 (3) 01
0.0 11 11.840 (4) 01 208.000 (3) 01
0.0 11 11.770 (4) 01 214.000 (3) 01
0.0 11 11.701 (5) 01 220.000 (3) 01
0.0 11 11.654 (5) 01 226.000 (3) 01
0.0 11 11.608 (5) 01 232.000 (3) 01
0.0 11 11.562 (5) 01 238.000 (3) 01
0.0 11 11.562 (5) 01 244.000 (3) 01
0.0 11 11.492 (5) 01 250.000 (3) 01
0.0 11 11.469 (5) 01 256.000 (3) 01
0.0 11 11.446 (5) 01 262.000 (3) 01
0.0 11 11.377 (5) 01 268.000 (3) 01
0.0 11 11.377 (5) 01 274.000 (3) 01
0.0 11 11.331 (5) 01 280.000 (3) 01
0.0 11 11.285 (5) 01 286.000 (3) 01
0.0 11 11.285 (5) 01 292.000 (3) 01
0.0 11 11.262 (5) 01 298.000 (3) 01
0.0 11 11.192 (5) 01 304.000 (3) 01
0.0 11 11.192 (5) 01 310.000 (3) 01
0.0 11 11.192 (5) 01 316.000 (3) 01
0.0 11 11.192 (5) 01 322.000 (3) 01
0.0 11 11.123 (5) 01 328.000 (3) 01
0.0 11 11.100 (3) 01 334.000 (3) 01
0.0 11 11.100 (3) 01 340.000 (3) 01
0.0 11 11.100 (3) 01 346.000 (3) 01
0.0 11 11.032 (5) 01 352.000 (3) 01
0.0 11 11.009 (5) 01 358.000 (3) 01
0.0 11 11.009 (5) 01 364.000 (3) 01
0.0 11 10.940 (4) 01 370.000 (3) 01
0.0 11 10.917 (5) 01 376.000 (3) 01
0.0 11 10.917 (5) 01 382.000 (3) 01
0.0 11 10.894 (5) 01 388.000 (3) 01
0.0 11 10.871 (5) 01 394.000 (3) 01
0.0 11 10.825 (5) 01 400.000 (3) 01
0.0 11 10.757 (5) 01 406.000 (3) 01
0.0 11 10.734 (5) 01 418.000 (3) 01
0.0 11 10.643 (5) 01 430.000 (3) 01
0.0 11 10.574 (5) 01 442.000 (3) 01
0.0 11 10.551 (5) 01 454.000 (3) 01
0.0 11 10.460 (4) 01 466.000 (3) 01
0.0 11 10.369 (5) 01 478.000 (3) 01
0.0 11 10.301 (5) 01 490.000 (3) 01
0.0 11 10.278 (5) 01 502.000 (3) 01
0.0 11 10.188 (5) 01 514.000 (3) 01
0.0 11 10.097 (5) 01 526.000 (3) 01
0.0 11 10.006 (5) 01 538.000 (3) 01
0.0 11 9.961 (4) 01 550.000 (3) 01
0.0 11 9.916 (4) 01 562.000 (3) 01
0.0 11 9.826 (4) 01 574.000 (3) 01
0.0 11 9.781 (4) 01 586.000 (3) 01
0.0 11 9.668 (4) 01 598.000 (3) 01
0.0 11 9.555 (4) 01 610.000 (3) 01
0.0 11 9.533 (4) 01 622.000 (3) 01
0.0 11 9.466 (4) 01 634.000 (3) 01
0.0 11 9.376 (4) 01 647.000 (3) 01
0.0 11 9.286 (4) 01 659.000 (3) 01
0.0 11 9.219 (4) 01 671.000 (3) 01
0.0 11 9.107 (4) 01 683.000 (3) 01
0.0 11 8.996 (4) 01 695.000 (3) 01
0.0 11 8.862 (4) 01 707.000 (3) 01
0.0 11 8.751 (4) 01 719.000 (3) 01
0.0 11 8.573 (4) 01 731.000 (3) 01
0.0 11 8.484 (4) 01 743.000 (3) 01
0.0 11 8.373 (4) 01 755.000 (3) 01
0.0 11 8.218 (4) 01 767.000 (3) 01
0.0 11 8.086 (4) 01 779.000 (3) 01
0.0 11 7.932 (4) 01 791.000 (3) 01
0.0 11 7.712 (4) 01 809.000 (3) 01
0.0 11 7.427 (4) 01 827.000 (3) 01
0.0 11 7.077 (4) 01 845.000 (3) 01
0.0 11 6.728 (4) 01 863.000 (3) 01
0.0 11 6.490 (3) 01 881.000 (3) 01
0.0 11 6.295 (4) 01 899.000 (3) 01
0.0 11 6.035 (4) 01 917.000 (3) 01
VarFlag: 0 0
Secondary header # 1 0. (1) Secondary header # 29 4. (1) Secondary header # 96 3. (1) Secondary header # 99 2014248. (7)
Measured Variable # 1 Information Code # 5 4. (1)
The first column (depth) is all zeroes. Temperature (column with variable code 1) looks OK. There is also another variable present (code 25), which is pressure. So in this case there are pressure data but no depth data in the profile. The WOD QC flag for depth is 1 (fail) for all but the first level while the originator QC flag is 1 (pass) for all depths despite being all zeroes (I assume as it is using 1 for good data, 2 for probably good, 3 for probably bad, 4 for bad rather than the WOD QC codes). @BoyerWOD, is this as intended (and if so is it possible to either add them or use the originator QC flags to indicate to look for pressure rather than depth) or should the depth data be there?
@BecCowley, I was hoping to keep file format considerations separate from AutoQC to avoid complicating things + @BoyerWOD had already written file conversion scripts to convert all the test data to WOD ASCII before we started AutoQC so it seemed to make sense to use that. However, there was talk about supporting a netCDF specification at the last workshop.
Thanks for looking into this, folks! Here's a list of profiles from my ascii-copy of quota that look like they have zero depth for all levels, per @BecCowley's request: zero_depth_profiles.txt
Let me know how we'd like to proceed re: defining our 'fail' flag. It might be possible to expand wodpy to support netcdf, but it's going to be very tight given our May deadline, and given that much of the time till then will be taken up by constructing and demonstrating a final decision algorithm.
@s-good @BillMills. I initially thought that the problem was the pressure record: the data in the original file is pressure, not depth. In our files we have either depth or pressure, but not both. It seems that the WOD format has capacity for both, but the conversion to depth is not made in the change from Mquest to WOD format.
However, going through the list you have provided shows that many of them are depth records and they are a mix of data types (BO, MB, CT, PF, XB, etc). So my conclusion is that I will send the list to Tim and ask him to review the problem.
For profiles with pressure only data: When we look at the profiles with pressure at CSIRO, we do a conversion on the fly to depth, rather than store it, so all pass flags are applied to depth, even though it is pressure that is in the file. Maybe you can do a conversion too? If depth is missing, look for pressure and do the conversion. It is a simple calculation using the teos-10 toolbox (http://www.teos-10.org/software.htm). The function is gsw_z_from_p. The toolbox is written in various languages, although python isn't one. This might be quicker and more useful than getting Tim to implement and re-create the dataset, although, he should be made aware of it.
Have you already implemented the teos-10 toolbox? If not, it will be very useful later.
Re: conversions from netcdf to WOD. That is fine, let's go with what we have available.
Thanks, @BecCowley! There appears to be a pythonic gsw under development that includes the function you mention here - assuming this works as expected out of the box, this shouldn't (famous last words) be a showstopper. Could even do a flag in wodpy that replaces literal z with z(p) automatically - comments, @s-good?
Hi @BecCowley, I looked through a few of the profiles in the list (I couldn't find some of the profiles though so something may have gone wrong with what I was doing!) I found two types:
Did you find any multi level profiles with no pressure and all 0 depth?
I'm not sure how interested we are in type 1 profiles? These are just sea surface temperature measurements and most of our QC tests will not work on these. Do you think we should filter these out?
We are already importing gsw in AutoQC (I think it might be used by CoTeDe) so it would be no problem to do the conversion on the fly as suggested. However, I think it would be good to get advice from @BoyerWOD first. Actually we should probably consider doing the opposite conversion for the Argo QC tests as these are supposed to work on pressure. I said to ignore this before to reduce complexity but since it looks simple to do with gsw maybe we should add it?
@s-good. I agree, the two profile types are as you list. I have found 5 types in the original MQuest format using the list of profile numbers (total of 10748 profiles):
I can understand the reasons types 1 to 4 have appeared. I do not know why type 5 has appeared. These should have depths recorded in the WOD version. @BoyerWOD will need to look at these specifically.
For type 1 and 3 (for temperature) the only tests likely to work are global range tests with these surface only data points.
I think we should utilize the gsw toolbox where necessary to switch between pressure and depth depending on the test requirements.
Close as stale, or are there any outstanding concerns here before we wrap up v1?
Maybe keep this open for now to remind ourselves that these exist?
Since Hamburg, the number of qctests implemented has more than doubled (thanks @s-good and @castelao!) - it might be interesting to run the complete stack on quota again, to see how much of an impact we've made on our 55% true positive rate since then. I'm happy to do this, but it also might be good for someone other than me to go through the procedure of deploying and running on AWS (or wherever), so more than one person is familiar with how to do it, and we can double check the instructions are complete and clear.