joewheaton commented 6 years ago

Hi @MattReimer and @philipbaileynar. Sorry not to have better news here on my FIS Testing. I discovered a bunch of minor things in comparing GCD 6 and GCD 7. More fundamentally, GCD 7 IS NOT producing anything close to the outputs of GCD 6 and I have compared GCD 6 (previously) to Matlab. Moreover, at around 25 minutes in the video below you will start to see GCD 7 falling apart on some of this... I push around the bugs to still complete the test, but its not producing acceptable outputs.

The Tests

Here I used our N_FIS Feshie Turtorial Data and create a GCD project in both GCD 6.1.14 and GCD 7.0.8. I used a 2 input FIS and a 3 input FIS. The 2 input was point-density and slope degrees, whereas the 3 input also included 3D point quality. This is the benchmark dataset used in the Wheaton et al. (2010) ESPL Paper and my thesis.

The Results

These two text files highlight the vast differences between the results using GCD 6 and 7: 3InputFISTest.txt 2InputFISTest.txt

If you want the GCD projects here they are:

Or a layer package

The Gory Details

This horrible 40 minute video walks through what I did: https://youtu.be/3VrUBEV5kTs

The short of the results are

joewheaton commented 6 years ago

Suggested Standard for Unit Testing FIS

Lets treat the GCD 6 FIS outputs as 'truth' as we'd previously verified those against Matlab Fuzzy Logic Toolbox (see also #174). I would strongly suggest we also test against scikit, and as that is open-source and Python it should be a quick an easy test to shove the text files through it.

I think the method we used above is good for testing. Briefly:

Have a point shapefile with 15-30 points manually selected to represent the diversity and range of values in the input file (e.g. These for Feshie). See video
Then produce all FIS associated surfaces required for *.fis model in both GCD 6.1.14 and GCD 7.0.8.
Use the ESRI Extract Multi Values to Points geoprocessing tool to extract all inputs and outputs for both GCD 6 and GCD 7 into same spreadsheet (see here if confused).
Compare results (NOTE: I did notice a small difference in the significant digits saved on the point density calculation between GCD 6 & GCD 7... so we should also keep an eye out on comparing the inputs). However, it may make sense to (if the inputs are slightly different derived from each version of GCD 6, to do the comparison of FIS outputs using exactly the same inputs in both.
We should set some tolerance for acceptable differences in outputs (i.e. rounding error)
Suggested Datasets, FIS Models & Why
Feshie 2006 (from Tutorial n-running-an-fis-dem-error-model) using this FIS TS_ZError_PD_SLPdeg.fis - Uses slope (degrees) & point density. Based on Wheaton et al. (2010). Use this because this is what we use in tutorials and teach off of.
Feshie 2006 with GPS_ZError_PD_SLPdeg_PQ.fis - Uses slope (degrees), GPS 3D Point Quality & point density. Based on Wheaton et al. (2010). Use this because this is what we use in tutorials was published.
CHaMP UGR_CRBW05583-013882 2013 Visit from tutorial o-champ-fis-error-modelling - with three different FIS models all from Bangen et al. 2016 WRR Paper:
- CHaMP_TS_ZError_PD_SLPdeg_3DQ_IntErr.fis - i.e. 4 Input
- CHaMP_TS_ZError_PD_SLPdeg_IntErr.fis - i.e. 3 Input
- CHaMP_TS_ZError_PD_SLPdeg_SR_3DQ_IntErr.fis

This will give us five solid tests from published and peer-reviewed FIS and known datasets we use in tutorials to compare against. We can also ask @bangen if she has any suggestions.

MattReimer commented 6 years ago

Ok, this exploration is going to grow a lot bigger than a github issue so I've created a whole repo to support it:

https://github.com/NorthArrowResearch/2018GCDFisVerify

Also, for full transparency I've source-controlled the projects I built to run these numbers. This seems like overkill but trust me from previous FIS bug fixes we'll be glad to have these:

Here they are: https://github.com/NorthArrowResearch/2018GCDFisVerify/tree/master/Data

Clone the repo to get at them.

You'll also see two shape files:

These files are just squares indicating which 9 cells I chose from each raster including little squares to indicate which cell is id=1

feshiescreenshot

These ids correspond to the google sheet and are pretty self-explanatory

| 1 | 2 | 3 | | 4 | 5 | 6 | | 7 | 8 | 9 |

here's the google spreadsheet with all the values:

https://docs.google.com/spreadsheets/d/1v6abeYaKZXQAyN25VEuN3NvffOTFth0cl1cJ24zNfLE/edit#gid=1355638424

I'm going to see what it would take to run all this through scikit-fuzzy but digitizing the .fis files accurately is tedious and I haven't found an importer yet so I may need to write one

@joewheaton what I need from you is to fill in the MatLab column. We need this badly because GCD6 as truth is problematic due to the bugs I found in the GCD6 console code.

It could be that the bugs I found were never expressed because they occurred in workflows that were never tried and it's very possible that I still translated it badly and created more issues but figuring out which is which is bound to cause insanity. We need an unarguable source of truth and Matlab is that.

MattReimer commented 6 years ago

Small update. I've gotten Scikit fuzzy numbers into the spreadsheet and they agree with GCD6 to within about 1%. Now I just have to figure out why GCD7 numbers are so different.

MattReimer commented 6 years ago

More small update: thanks to Sara we now have parity between GCD6, scikit-fuzzy and Matlab. There are differences but they are all less than 1% so I'm going to stock this to rounding errors in floating point math and move on.

If you check here I've provided visualizations for each cell:

https://docs.google.com/spreadsheets/d/1v6abeYaKZXQAyN25VEuN3NvffOTFth0cl1cJ24zNfLE/edit#gid=1355638424

Now on to figure out why GCD7 is actually broken

MattReimer commented 6 years ago

I fixed a stupid error where the AND and OR logic was reversed. A 1-line fix after 3 days of number crunching. Gotta love programming in the #wildworldofscience!

@joewheaton we are now in a place where we have to give it the thumbs-up or not.

If you check the Comparison sheet on the google sheet you'll see what we're dealing with.

GCD7 seems to have diverged from Matlab (greatest difference from GCD6 was 2.3% but now it's 3.6%) but we're still fairly close to scikit-fuzzy so maybe that's worth something?

I'm good if you are but if you're still concerned we have work to do and I expect increasing effort and diminishing returns in trying to close the gap.

philipbaileynar commented 6 years ago

I created an FIS error surface in GCD 6 and 7 using the same associated surfaces and FIS rule file. I was pretty meticulous and reused the same slope and point density rasters rather than had each version of GCD generate them.

See DoD raster of the differences (the ESRI and QGIS images are of the same thing. Each has its strengths). The max difference is 12cm but this only occurs in a handful of cells. The mean difference is less than 1cm.

2018-04-12_122357 2018-04-12_122830

Riverscapes / gcd

FIS Testing GCD 6 vs 7 #190

The Tests

The Results

The Gory Details

Suggested Standard for Unit Testing FIS

Suggested Datasets, FIS Models & Why