Closed rnugent3 closed 1 year ago
My current hypothesis is that f_inverse can give us different answers at similar flat spots. For example, we could be searching for the AEP from different directions so we get different answers at flat spots? I think that we should focus on the performance of f_inverse at flat spots.
I collected some data using from a compute with the Muncie study using a reasonable graphical stage-frequency function. For reference, the target stage is 944.8572. There are some big discontinuities at 0.01, 0.02, and 0.04:
Upon inspection, there appears to be problems with the way that we use f_inverse on paireddata. I collected all of the sampled stage-frequency functions and the aep of the target stage for each sampled function. Then, I sorted the functions on AEP. I have included screen shots of what I see in the data below.
Here, both AEPs should be about 0.01. The bottom function has an AEP indicated at 0.0075, which is correct, but inconsistent.
Here, both AEPs should be about 0.02. The bottom function has an AEP indicated at 0.0179, which is correct, but inconsistent.
The two functions where the jump exist are exactly the same so the AEPs should be exactly the same, but the AEPs are not.
I think that these "flat spots" can be explained by kinks in the stage-frequency function. In the image below, each of the AEP events see a kink where the stage-frequency shoots up. It seems that the flat spots- where stage does not change much for a big change in probability - follow a kink in the stage-frequency function where stage changes a lot for some change in probability.
At the moment, I think the two more important questions are:
My proposed answers:
Even after testing that f_inverse behaves as we expect, we can still expect discontinuities. If there is a range of frequencies for which the stage-frequency function is always flat, and that range includes the target stage a fraction of the time, then that fraction will all have the same AEP - and the smaller AEPs that have the same stage will have near zero relative frequency.
The interpolation scheme of the graphical frequency function should be reviewed. It does seem curious that we have the highest change in stage leading to the stage of user-entered coordinates.
I think this is what is happening:
So those discontinuities are in the distribution of AEP are areas where the stage frequency curve is flat and so the AEP for that flat part of the probability domain always goes to the more frequent AEP and the other AEPs where the stages are flat never materialize.
The discontinuous distribution is mathematically correct. It's weird but correct.
Even with lots of and lots of iterations, the distribution of AEPs remains discontinuous. We need to investigate as to whether this is a bug or an artifact of the data.