Transform Summary Relationships Produce Unintuitive Result

rnugent3 commented 1 month ago

We have a case where mean EAD increases when a regulated-unregulated transform flow function is applied. This phenomenon only occurs when EAD is computed with uncertainty. The preview compute produces the expected result: deterministic EAD is decreased by flow regulation. The study of interest is Sum City, Impact Area 7, DB Base Scenario.

The behavior appears to be related to the internally computed stage-damage functions. The consequences compute with uncertainty results in significant damage at the most frequent stage. This result points to whacky modeling but a result seen in the field and thus a result we need to be able to handle.

I created two more sets of stage-damage functions to test my suspicion. Manual stage-damage functions. The first set looks normal, has stages that have zero damage. I computed EAD with uncertainty, without regulation: I computed EAD with uncertainty, with regulation, EAD goes down, the expected behavior:

The next is a truncated version, there are no stages with zero damage. I computed EAD with uncertainty, without regulation: I computed EAD with uncertainty, with regulation, EAD goes up, the unexpected behavior:

Somewhere, the math goes awry, and I haven't figured out where that math in the full compute with uncertainty is going wrong, but I know that the stage-damage function is the aggressor.

A recommended path forward might be to force a zero-damaging stage on the stage-damage function with uncertainty and see if this behavior persists. The trick would be, at what stage would we force zero damage? I think that we could go really close to the most frequent stage, something like 0.1 feet, because we need to put a boundary around the bad modeling.

rnugent3 commented 1 month ago

Original model is here: \\share.hec.usace.army.mil\USACE\HEC-FDA\Version_2.0\TechTransfer\ApplicationsGuide\Sum City Model

rnugent3 commented 1 month ago

Manual stage damage functions are attached sumcitymanualstagedamage.xlsx

rnugent3 commented 2 weeks ago

Ok. I think we have a general transformation problem.

Three different studies: Sum City, Muncie, and West Sac. Apply flow or stage transform functions where regulated = unregulated or exterior = interior for the entire domain and results change.

Here is an example of West Sac.

Original West Sac results. EAD = 1.825M West Sac with Reg = Unreg. EAD = 1.889M West Sac with Ext = Int. EAD = 1.816M

rnugent3 commented 2 weeks ago

Here is Muncie. Similar in that flow transform causes bigger deviation. Different in that flow transform decreases EAD instead of increasing EAD.

Original. EAD = .986M Reg = Unreg. EAD = 0.878M Int = Ext. EAD = 0.972M

rnugent3 commented 2 weeks ago

I can track in the IDE that a flow transformation with X = Y does not have any impact on a flow-frequency function, provided that there is sufficient overlap between the two functions.

rnugent3 commented 2 weeks ago

This situation continues to defy logic. I have added some diagnostic code to prove that a flow transformation with X = Y does not have any impact on a flow-frequency function, provided that there is sufficient overlap between the two functions.

In the below code, I added the check that the transformed flow-frequency function matches the sampled flow-frequency function after composing with a regulated-unregulated function where X=Y.

The functions are always equal for multiple study data sets, we never enter the code block provided sufficient overlap. And yet, mean EAD is different for each study data set when the reg-unreg with X=Y is used.

We do find ourselves in the code block when there is not sufficient overlap, so I know the diagnostic logic works as I expect, and I have been able to rule out overlap between flow-frequency and reg-unreg as a contributor to the difference under study. Overlap definitely matters for mean EAD, but there is something still wrong with a big difference in EAD after overlap is resolved.

                                else
                                {
                                    PairedData inflow_outflow_sample = _UnregulatedRegulated.SamplePairedData(threadlocalRandomProvider.NextRandom(), computeIsDeterministic); //should be a random number
                                    PairedData transformff = inflow_outflow_sample.compose(frequencyDischarge);
                                    bool functionsAreEqual = transformff.Equals(frequencyDischarge);
                                    if (!functionsAreEqual)
                                    {
                                        functionsAreEqual = false;
                                        //STOP THE WORLD IS OVER
                                    }
                                    PairedData discharge_stage_sample = _DischargeStage.SamplePairedData(threadlocalRandomProvider.NextRandom(), computeIsDeterministic);//needs to be a random number
                                    PairedData frequency_stage = discharge_stage_sample.compose(transformff);
                                    ComputeFromStageFrequency(threadlocalRandomProvider, frequency_stage, i, computeWithDamage, computeIsDeterministic);
                                }

rnugent3 commented 2 weeks ago

I changed that code again. I pulled transformff out of the compose with the stage-discharge and replaced it with dischargeFrequency, so, in the one place we should be using a transform flow, remove it from the compute in the code, and the transform flow still impacts mean EAD, so it seems like we might have included the transform in the logic somewhere else???

rnugent3 commented 2 weeks ago

When I pull out the transformff from the compose, I can see in the preview compute that applying a transform function has no affect on deterministic EAD.

And still, with flow transformation completely unplugged from the compute, selection of a transform function in the construction of an impact area scenario causes a different mean EAD.

I can also show that the difference in mean EAD is insensitive to the values in the assigned transformation function. So I've pulled transformation out of the compute, I assign any transformation function to the impact area scenario, and mean EAD is 214k, independent of whatever flow values are in the function. If I don't assign a transformation function, mean EAD is 190k.

rnugent3 commented 2 weeks ago

Could it be the sequence of random numbers that matters here?

rnugent3 commented 2 weeks ago

Yes. Random number sequence, YIKES!

rnugent3 commented 2 weeks ago

Currently, we use one sequence of random numbers for all summary relationship sampling in an EAD compute with uncertainty.

The EAD distribution appears to be very sensitive to the sequence of "random" numbers. We knew there was sensitivity, which is why the compute needs to be seeded.

What we missed is that the seeding of the compute breaks when summary relationships are added. Adding the sampling of regulated-unregulated transform flow function means that the sequence of random numbers is shifted, and now a different set of random numbers are injected in the sampling of the stage-discharge function, which causes the EAD distribution to be different.

What I think this means is that every single summary relationship needs its own seed and its own sequence of random numbers, so that the sequence of random numbers that a summary relationship accepts in sampling is the same for every compute.

rnugent3 commented 2 weeks ago

And for that matter, we'll also need separately seeded random number generators for each stage-damage function, where there is one function for each dam cat - asset cat combination, and there are separate stage-damage functions for nonfail when fail and nonfail are being combined for total risk.

rnugent3 commented 2 weeks ago

We have a case where mean EAD increases when a regulated-unregulated transform flow function is applied. This phenomenon only occurs when EAD is computed with uncertainty. The preview compute produces the expected result: deterministic EAD is decreased by flow regulation. The study of interest is Sum City, Impact Area 7, DB Base Scenario.

The behavior appears to be related to the internally computed stage-damage functions. The consequences compute with uncertainty results in significant damage at the most frequent stage. This result points to whacky modeling but a result seen in the field and thus a result we need to be able to handle.

I created two more sets of stage-damage functions to test my suspicion. Manual stage-damage functions. The first set looks normal, has stages that have zero damage. I computed EAD with uncertainty, without regulation: I computed EAD with uncertainty, with regulation, EAD goes down, the expected behavior:

The next is a truncated version, there are no stages with zero damage. I computed EAD with uncertainty, without regulation: I computed EAD with uncertainty, with regulation, EAD goes up, the unexpected behavior:

Somewhere, the math goes awry, and I haven't figured out where that math in the full compute with uncertainty is going wrong, but I know that the stage-damage function is the aggressor.

A recommended path forward might be to force a zero-damaging stage on the stage-damage function with uncertainty and see if this behavior persists. The trick would be, at what stage would we force zero damage? I think that we could go really close to the most frequent stage, something like 0.1 feet, because we need to put a boundary around the bad modeling.

I am not tracking how the random number issue relates to the differences observed when damage is truncated versus not as in what I showed at the top of this ticket. I'll need to look at that after.

First, if I apply a transform function where there is no transformation so that X=Y then there should be no difference in the result and I think we achieve that through enhanced random number generation.

rnugent3 commented 2 weeks ago

I am thinking about each UPD having its own set of seeded RNG by iteration and passing the iteration into sample paired data to use the right RNG during the compute.

All of the UPD can be given the same set of master seeds created for the provided convergence criteria. We can construct the set of RNG on command, and not worry about reading and writing to file.

Sample paired data can be overloaded to accept and use indices, the other sample paired data can remain as is.

It is possible that there are similar ramifications in the consequences engine. If we add an occupancy type to an occupancy type set, could that break the sequence of seeded random numbers, resulting in different damage distributions? Eek

rnugent3 commented 2 weeks ago

Generally speaking, I think that every single THING that can be sampled, e.g. UPD, Value Uncertainty, Value Ratio Uncertainty, Continuous Dist, they all have to own their own RNGs so that their sampling can be replicated independent of adding or removing other things.

rnugent3 commented 2 weeks ago

I am thinking about each UPD having its own set of seeded RNG by iteration and passing the iteration into sample paired data to use the right RNG during the compute.

All of the UPD can be given the same set of master seeds created for the provided convergence criteria. We can construct the set of RNG on command, and not worry about reading and writing to file.

Sample paired data can be overloaded to accept and use indices, the other sample paired data can remain as is.

It is possible that there are similar ramifications in the consequences engine. If we add an occupancy type to an occupancy type set, could that break the sequence of seeded random numbers, resulting in different damage distributions? Eek

As each thing might have its own RNG, we can seed it uniquely and tell it to populate an array of random numbers and then during the compute reference the correct random number for iteration i so that we control for random order of iterations in a parallel compute

rnugent3 commented 1 week ago

Closed with PR #1168

HydrologicEngineeringCenter / HEC-FDA

Transform Summary Relationships Produce Unintuitive Result #1156