Riverscapes / gcd

Geomorphic Change Detection For Windows
http://gcd.riverscapes.xyz
GNU General Public License v3.0
25 stars 5 forks source link

Conceptual Questions with Error Modeling in GCD #383

Closed joewheaton closed 3 years ago

joewheaton commented 4 years ago

From Zach Hilgendorf at ASU:

My work focuses on monitoring coastal foredune restoration projects in central and northern California through repeat TLS and UAS surveys. Given the nature of my work, I have spent a lot of time using the GCD toolset and teaching it my colleagues. I was hoping I could ask you a few questions regarding the handling of error and uncertainty in GCD.

So far, we have used spatially uniform error models to account for uncertainty within our campaigns. This error is primarily a compound of TLS/UAS-SfM alignment error, and RTK-GPS error/OPUS base station rectification error. I want to move towards a FIS model but I'm hung up on a few key points. I've read Wheaton et al. (2010a; 2010b), Hensleigh (2014), Schaffrath et al. (2015), and Bangen et al. (2016), and have watched a number of your tutorials. I understand how to make a FIS and what goes into it. But I'm still stuck on considering what comes out of it. Essentially, my first question is how do you accurately establish an output range/extent for the FIS model? I believe you mentioned 0.04m was a typical TLS error range in some of your tutorials, but I feel like there is a survey-specific "sweet spot" that I should determine, but can't wrap my head around how to come to that number.

In your 2010 "Accounting for Uncertainty..." paper and Hensleigh's thesis, it's mentioned that Bayesian methods can incorporate the spatially uniform and variable error models via a spatial contiguity index. I've been thinking, quite a bit lately, about how to include the error inherent to methods/equipment (such as what goes into the uniform model), while employing a FIS model. I don't recall seeing much discussion of the conditional probability in the other papers, but was curious to know if you had implemented/considered this further. I had thought of merging the outputs of the models, but was uncertain (pun somewhat intended) on best practices to do that, as I hadn't seen it elsewhere. Maybe I'm just in the weeds or stuck in a single train of thought, but I feel like I need to include that equipment-based error, or use it to inform the FIS output in some way to get a truly representative error model. So, my second question is basically: How do you suggest coupling both the equipment error (uniform model) with a FIS model to generate a single error model (in meters) output? I think there is merit to using both, but I just feel like something is off in my thinking.

joewheaton commented 4 years ago

Spatially Uniform ...

So far, we have used spatially uniform error models to account for uncertainty within our campaigns. This error is primarily a compound of TLS/UAS-SfM alignment error, and RTK-GPS error/OPUS base station rectification error.

Nothing wrong with spatially uniform if its fit for purpose (i.e. if it is conservative in error model, defensible and good enough to see what signal you need to).  The wrong reason to move towards a spatially variable error model is because "it is better" or "fancier". A good reason is because it will help develop more realistic, spatially variable model fit for your purposes and allow you to recover/uncover changes you think are real and discriminate the noise out.

The FIS game... fundamentals

I want to move towards a FIS model but I'm hung up on a few key points. I've read Wheaton et al. (2010a; 2010b), Hensleigh (2014), Schaffrath et al. (2015), and Bangen et al. (2016), and have watched a number of your tutorials. I understand how to make a FIS and what goes into it. But I'm still stuck on considering what comes out of it.

That's the funny thing about an FIS. Unlike an empirically fit model, or a analytical model, the output (as long as it is a numeric quantity on on a continuum that can be broken into some overlapping categories) is whatever you want. The exact relationship between those outputs and the inputs is NOT explicitly mapped in the first instance with a formula, but crudely and coneptually articulated with a rule table or inference system. So the output could be (as we do in GCD) an elevation uncertainty or "error" in the DEM. It could be applied with different inputs to represent many things.  

Essentially, my first question is how do you accurately establish an output range/extent for the FIS model?

The key is whatever quantity you are modelling, you have a good independent method to measure, quantify, model or estimate it. Whatever your output is, start with just thinking about it as low, medium and high (if that is not discriminating enough add some other categories... e.g. really low and really extreme to deal with edge cases or outlier behavior.). THEN, after inference system works conceptually like you want,  then calibrate the specification of the output membership functions to the empirical distribution. We never polished many of the tools we used on this into professional-grade tools like GCD, so most of these are just in the research-grade TAT (http://tat.riverscaps.xyz). One of the things we used a bunch to do this was the coincident points tool. This is good when you have large point clouds (e.g. TLS, SFM, MBES) and you can basically just treat all the points in some small window (e.g. 1-5 cm) as the same. The tool just looks at differences between these as an independent measure of uncertainty. As long as you don't use this same measure as an input to your FIS (a bit circular), this is a great independent measure to calibrate your low, medium and high classes. I find common sense is a much better thing to use for your extreme and outlier clasess. For example, what's the worse it could be? In a stream, that might be the biggest roughness element, the tallest tree or the tallest bank. In your work on coastal foredunes, it might be the tallest dune grasses or the largest amplitude dune. On the other end, what's the best it could be? This is likely limited either by the grain size or the precision of the instrument or method you are using to acquire data. All these values should be calibrated on a survey-method-by-survey method and environment by environment basis (not necessary to make indepently for every site or every survey at a site).   

Calibrating the FIS

I believe you mentioned 0.04m was a typical TLS error range in some of your tutorials, but I feel like there is a survey-specific "sweet spot" that I should determine, but can't wrap my head around how to come to that number.

As implied above, a little art, experience, emprical evidence and common sense. James little FIS Development Assistant  are some nice R scripts to help you feel like you're doing this robustly and systematically. The thing is they are only as good as the situations represented in the training datasets. They  deal with 95% of cases well if they are represented empirically in the training data. The problem is, lots of interesting outliers and combinations of inputs won't be captured. Anyhow, the 0.04 m is a horrible rule of thumb for TLS in gravel bed rivers, which IS NOT drien by the instrument as much as it is the typical roughness height and typical cell resolution people model at.  You should do something different for coastal or aeolian settings. There, since you're dealing with sand, your grain size roughness is on order of precision of typical TLS. So the limiting factors may have more to do with form roughness height of bedforms (e.g. ripples) and their roughness heght that are superimposed on top of larger bedforms (e.g. dunes) and topography. 

Bayesian Updating

In your 2010 "Accounting for Uncertainty..." paper and Hensleigh's thesis, it's mentioned that Bayesian methods can incorporate the spatially uniform and variable error models via a spatial contiguity index. I've been thinking, quite a bit lately, about how to include the error inherent to methods/equipment (such as what goes into the uniform model), while employing a FIS model. I don't recall seeing much discussion of the conditional probability in the other papers, but was curious to know if you had implemented/considered this further.

I have stopped using Bayesian updating in most environments because most surveys (especially TLS and SFM) have lots of systematic bias and errors and what the method as described in 2010 paper does is pretty crap at discriminating the actually coherent from the systematic noise. We got away with it there for GPS, but few people spend us much time as we did on that any more. All these instant gratification remote sensing techniques everyone is so excited about these days do an awful job of providing signals that are easy to discriminate out the systematic stuff with. So easy answer is don't bother with Bayesian unless your confident you are really looking at situation where systematic errors are already accounted for.

The more complicated answer is that Bayesian can work really well if it is applied in a spatially variable way itself (i.e. applied to areas of your survey you trust). GCD does not have this as a feature right now, and it is buried in feature requests on our issue board, but not something I have done.

I had thought of merging the outputs of the models, but was uncertain (pun somewhat intended) on best practices to do that, as I hadn't seen it elsewhere. We've actually done this before for fish habitat models and we do it for our GCD (i.e. we have other models and FIS models that act as inputs into one combined FIS. It can work well and make it easier to interpret the behavior of these models.

Combining Error Models

Maybe I'm just in the weeds or stuck in a single train of thought, but I feel like I need to include that equipment-based error, or use it to inform the FIS output in some way to get a truly representative error model.

Well, you are a PhD student... so what better time to ponder some of these questions (for a little while). Trick is to recognize which tangents will buy you a real break through or deeper understanding, and which just burn up time (some of which is healthy). Remember that most of these things (i.e. 2 week tangents) end up as 1-3 sentences in your methods or maybe a paragraph or sub-section in the eventual manuscript frustratingly so. How much time you spend on some of them depends on whether you are more interested in the error and uncertainty model and geomatics, or more interested in getting something reasonable to interpret the geomorphology and processes shaping these coastal dunes.

So, my second question is basically: How do you suggest coupling both the equipment error (uniform model) with a FIS model to generate a single error model (in meters) output? I think there is merit to using both, but I just feel like something is off in my thinking.

For equipment error, I would just use that to calibrate the overlap between your "really low" or "as good as it can get" class in elevation uncertainty and your "low" or "good" class, which represents what you think you can resolve where you have good coverage and a straight forward surface to represent. I wouldn't bother using it as two models. The case where you can combine two models is where you have a variety of things, that can be modeled in a spatially variable way easily. For example, Hensliegh has used the ever so common output in MBES of TPU (total predicted uncertainty), which is "full error budget" model and paints such a bleak picture of your data that no one ever uses it. However, the spatial pattern is really good and relative magintudes make sense. So it gets dumped down to similar categories as the output through specifying fuzzy membership functions and after that you don't care what the actual values of error are. Then, as stated above the combined model can combine that as an input with the other things you use (e.g. slope, roughness) and then your FIS just independently calibrates the output to a reasonable range of values.

I hope that helps. Feel free to respond here. If this becomes too complicated in one thread, I will split into separate threads by the headings above...