Open voj opened 6 months ago
Related to #292
Investigation into the -1 hack in MisfitStats
(i.e. why do we have that hack and where do those numbers come from?). The hypothesis is that we want to filter out target MFD bins we have set to 1e-20 when looking at misfits.
Using sample values from a run reproducing http://simple-toshi-ui.s3-website-ap-southeast-2.amazonaws.com/InversionSolution/SW52ZXJzaW9uU29sdXRpb246MTEzMDMy/InversionSolutionDetailTab
Misfit
array and D
vector are organised like this:
UncertSlipRate: [0..2325), 2325 rows
PaleoRate: [2325..2390), 65 rows
LaplaceSmooth: [2390..2550), 160 rows
RateMinimize: [2550..10265), 7715 rows
UncertMFDEquality: [10265..10299), 34 rows
(read that as "cells" rather than "rows")
In MFDInversionConstraint.encode()
, the UncertMFDEquality
portion of D
is calculated as 1 where bin probability is 1e-20
Interestingly, it would be 0 if bin probability was 0. That feels a bit inconsistent.
In SerialSimulatedAnnealing.calculateMisfit(
), initial misfits are calculated as
A * initalSolution - D
initialSolutio
n is 0 in our case, so the initial misfits are -1 where D is 1.
This means misfit values for MFDInversionConstraint
are indeed -1 where we set bins to 1e-20.
So we do seem to have done this to filter out superfluous MFD bins. However, the hack will filter out any misfit values close to -1, irrespective of which constraint they relate to.
Further investigation shows that misfits are modified each round, not preserving the special -1 values.
This table gets a new line every time ReweightEvenFitSimulatedAnnealing.calcUncertWtStats()
is called with the same configuration as from the previous comment.
These are some of the misfit values for the MFDInversionConstraint
that start out as -1 in the first round.
-1 -1 -1 -1 -1 -1
-2.8045747350497 -2.8045747350497 -2.8045747350497 -2.8045747350497 -2.8045747350497 -2.8045747350497
-4.5458775621967 -4.5458775621967 -4.5458775621967 -4.5458775621967 -4.5458775621967 -4.5458775621967
-5.13167277383692 -5.13167277383692 -5.13167277383692 -5.13167277383692 -5.13167277383692 -5.13167277383692
-5.30153745217704 -5.30153745217704 -5.30153745217704 -5.30153745217704 -5.30153745217704 -5.30153745217704
-5.33228202310738 -5.33228202310738 -5.33228202310738 -5.33228202310738 -5.33228202310738 -5.33228202310738
-5.34650368935973 -5.34650368935973 -5.34650368935973 -5.34650368935973 -5.34650368935973 -5.34650368935973
-5.33992555204324 -5.33992555204324 -5.33992555204324 -5.33992555204324 -5.33992555204324 -5.33992555204324
-5.34080471274944 -5.34080471274944 -5.34080471274944 -5.34080471274944 -5.34080471274944 -5.34080471274944
-5.32953299345063 -5.32953299345063 -5.32953299345063 -5.32953299345063 -5.32953299345063 -5.32953299345063
-5.31678855601529 -5.31678855601529 -5.31678855601529 -5.31678855601529 -5.31678855601529 -5.31678855601529
-5.31226099650842 -5.31226099650842 -5.31226099650842 -5.31226099650842 -5.31226099650842 -5.31226099650842
Coming back to this, I logged the data in the wrong location. When logging from the MisfitStats
constructor where our special handling for -1
is located, these values are preserved. In my test run, in the first few iterations, the -1 values for Uncertain MFD Equality
where unchanged, and then they started to fluctuate slightly but well within the tolerance of our code:
-0.9999999999999998
-0.9999999999999997
-0.9999999999999994
-0.9999999999999997
It's notable that in my run there was also one UncertSlipRate
constraint value that was -1 and triggered our special handling when it probably should not have. This could be mitigated by ensuring that our code only runs for the Uncertain MFD Equality
constraint.
Port as much code from
opensha
as we need to be able to remove that branch