Estimating uncertainties from data as a part of the workflow

istfer commented 8 years ago

How do we want to handle the uncertainties in the data?

When reported in the file, we can use that
We also need a flexible way to estimate the uncertainty from data

Things to discuss/ideas:

What’s the generic design pattern?
How to insert into workflows?
Uncertainty variables associated with every variable?
- If there are uncertainties in the data, what do you tie them into BETY variables?
- Uncertainties could be represented in multiple different ways
- Make every variable have an implict [foo]_var or [foo]_unc
- FORMATS not set up for this

Current examples we have now:

Other ideas/examples?

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.

istfer commented 4 years ago

Unstaling: I think this is still relevant and something I'm very much interested

istfer commented 4 years ago

@mdietze would you have any resource suggestions (to read) regarding this? What to do when observations don't report uncertainties? For different data types? What kind of approaches are out there?

mdietze commented 4 years ago

I don't know if there's going to be a completely generic approach to how to deal with observation error, but I think it would be useful to think through some common sources of observation error so that we have tools to handle each. Sampling error is going to be a particular common issue, which might be handled based on distributional assumptions (e.g. Poisson) or resampling/bootstrapping, though the latter often requires consideration of specific sampling designs. Calibration errors are also going to be common, but those are hard to account for if not reported. At times you might also need to deal with spatial and temporal interpolation uncertainties, which can be done if you have the raw data but hard if you're working with some else's map product. Even harder is how to aggregate spatial and temporal uncertainties if you don't have the posterior samples. Similar logic applies to derived data products, where there can be uncertainties about model parameters, structure, inputs, etc.

We also need an overall plan for how we want to formally track observation uncertainties within the system, so that once one module estimates it (or reads it from the original data source) it is passed forward into benchmarking, visualization, PDA, SDA, etc.

ashiklom commented 3 years ago

Bump for 2020 ROSES OSS proposal discussion.

We also need an overall plan for how we want to formally track observation uncertainties within the system, so that once one module estimates it (or reads it from the original data source) it is passed forward into benchmarking, visualization, PDA, SDA, etc.

I feel like this is why it would be useful to have a standard format (or set of standard formats) for a lot of these inputs. Then, we can just store the uncertainties (and uncertainty metadata) inside those files. We can borrow heavily from our EFI forecast standards discussions. From a code standpoint, you could have some uncertainties transferred directly from the source by the download.X function, and for other products, do this in steps (e.g., download.Y followed by add_uncertainty.Y).

A useful side-effect of having those be files with machine-readable metadata is that it makes it a lot easier to share those directly with other people, and makes it easier to develop tools for them in other languages (if the need arises).

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 365 days with no activity.

PecanProject / pecan

Estimating uncertainties from data as a part of the workflow #1033