GNS-Science / nzshm-opensha

renamed successor to old nshm-nz-opensha , all python history removed to new nzshm-runzi repo
MIT License
0 stars 2 forks source link

Port everything required to get rid of fix/rup-normalization-2024 #294

Open voj opened 6 months ago

voj commented 6 months ago

Port as much code from opensha as we need to be able to remove that branch

voj commented 6 months ago

Related to #292

voj commented 6 months ago

Investigation into the -1 hack in MisfitStats (i.e. why do we have that hack and where do those numbers come from?). The hypothesis is that we want to filter out target MFD bins we have set to 1e-20 when looking at misfits.

Using sample values from a run reproducing http://simple-toshi-ui.s3-website-ap-southeast-2.amazonaws.com/InversionSolution/SW52ZXJzaW9uU29sdXRpb246MTEzMDMy/InversionSolutionDetailTab

Misfit array and D vector are organised like this:

UncertSlipRate: [0..2325), 2325 rows
PaleoRate: [2325..2390), 65 rows
LaplaceSmooth: [2390..2550), 160 rows
RateMinimize: [2550..10265), 7715 rows
UncertMFDEquality: [10265..10299), 34 rows

(read that as "cells" rather than "rows")

In MFDInversionConstraint.encode(), the UncertMFDEquality portion of D is calculated as 1 where bin probability is 1e-20 Interestingly, it would be 0 if bin probability was 0. That feels a bit inconsistent.

In SerialSimulatedAnnealing.calculateMisfit(), initial misfits are calculated as

A * initalSolution - D

initialSolution is 0 in our case, so the initial misfits are -1 where D is 1.

This means misfit values for MFDInversionConstraint are indeed -1 where we set bins to 1e-20.

So we do seem to have done this to filter out superfluous MFD bins. However, the hack will filter out any misfit values close to -1, irrespective of which constraint they relate to.

voj commented 6 months ago

Further investigation shows that misfits are modified each round, not preserving the special -1 values.

This table gets a new line every time ReweightEvenFitSimulatedAnnealing.calcUncertWtStats() is called with the same configuration as from the previous comment.

These are some of the misfit values for the MFDInversionConstraint that start out as -1 in the first round.

-1  -1  -1  -1  -1  -1
-2.8045747350497    -2.8045747350497    -2.8045747350497    -2.8045747350497    -2.8045747350497    -2.8045747350497
-4.5458775621967    -4.5458775621967    -4.5458775621967    -4.5458775621967    -4.5458775621967    -4.5458775621967
-5.13167277383692   -5.13167277383692   -5.13167277383692   -5.13167277383692   -5.13167277383692   -5.13167277383692
-5.30153745217704   -5.30153745217704   -5.30153745217704   -5.30153745217704   -5.30153745217704   -5.30153745217704
-5.33228202310738   -5.33228202310738   -5.33228202310738   -5.33228202310738   -5.33228202310738   -5.33228202310738
-5.34650368935973   -5.34650368935973   -5.34650368935973   -5.34650368935973   -5.34650368935973   -5.34650368935973
-5.33992555204324   -5.33992555204324   -5.33992555204324   -5.33992555204324   -5.33992555204324   -5.33992555204324
-5.34080471274944   -5.34080471274944   -5.34080471274944   -5.34080471274944   -5.34080471274944   -5.34080471274944
-5.32953299345063   -5.32953299345063   -5.32953299345063   -5.32953299345063   -5.32953299345063   -5.32953299345063
-5.31678855601529   -5.31678855601529   -5.31678855601529   -5.31678855601529   -5.31678855601529   -5.31678855601529
-5.31226099650842   -5.31226099650842   -5.31226099650842   -5.31226099650842   -5.31226099650842   -5.31226099650842
voj commented 2 months ago

Coming back to this, I logged the data in the wrong location. When logging from the MisfitStats constructor where our special handling for -1 is located, these values are preserved. In my test run, in the first few iterations, the -1 values for Uncertain MFD Equality where unchanged, and then they started to fluctuate slightly but well within the tolerance of our code:

-0.9999999999999998
-0.9999999999999997
-0.9999999999999994
-0.9999999999999997

It's notable that in my run there was also one UncertSlipRate constraint value that was -1 and triggered our special handling when it probably should not have. This could be mitigated by ensuring that our code only runs for the Uncertain MFD Equality constraint.