A large event_set scenario reveals different results between trunk and branch memory_usage

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Switch to internal GA repository
2. Run case_studies/test_national/large_coverage.py on trunk. Save output
3. Run case_studies/test_national/large_coverage.py on branches/memory_usage. 
Save output
4. Diff the two outputs

What is the expected output? What do you see instead?
No difference expected. Differences shown.

Please use labels and text to provide additional information.
The changes made in branch memory_usage pass all unit tests and scenario tests 
(even ones designed to stress the changes), but does not pass this scenario. 
The difference is that large_coverage.py contains a lot of sites and events.

Original issue reported on code.google.com by b...@girorosso.com on 20 Jan 2012 at 3:06

GoogleCodeExporter commented 9 years ago

Values for the first site worked upon.

log_mean_extend_GM 

Branch
[[[[-4.92514818 -4.11326193 -4.46624651]
   [-6.10884811 -5.34239597 -6.44408879]
   [-5.78429488 -5.02613761 -6.7193829 ]
   ..., 
   [-5.12605744 -4.35414982 -4.31091228]
   [-4.96135567 -4.16365848 -4.8862455 ]
   [-4.85890558 -4.06983996 -4.88175621]]]]

Trunk (same indices)
[[[-4.92514818 -4.11326193 -4.46624651]]]
[[[-6.10884811 -5.34239597 -6.44408879]]]
[[[-5.78429488 -5.02613761 -6.7193829 ]]]
...
[[[-5.12605744 -4.35414982 -4.31091228]]]
[[[-4.96135567 -4.16365848 -4.8862455 ]]]
[[[-4.85890558 -4.06983996 -4.88175621]]]

log_sigma_extend_GM 

Branch
[[[[ 0.58842965  0.61920408  0.6631483 ]
   [ 0.60294337  0.63033431  0.64594591]
   [ 0.585235    0.61188234  0.64132675]
   ..., 
   [ 0.57130642  0.60330374  0.6691298 ]
   [ 0.60517178  0.63470747  0.65732366]
   [ 0.60726219  0.63664044  0.65659794]]]]

Trunk (same indices)
[[[ 0.58842965  0.61920408  0.6631483 ]]]
[[[ 0.60294337  0.63033431  0.64594591]]]
[[[ 0.585235    0.61188234  0.64132675]]]
...
[[[ 0.57130642  0.60330374  0.6691298 ]]]
[[[ 0.60517178  0.63470747  0.65732366]]]
[[[ 0.60726219  0.63664044  0.65659794]]]

bedrock_SA

Calculated based on a random sample of the GM set. This will diverge as the 
event set sizes are different.

Branch
[[[[[[ 0.01589734  0.02546734  0.0041234 ]
     [ 0.0022119   0.00707812  0.00099853]
     [ 0.0035925   0.0070149   0.00121061]
     ..., 
     [ 0.00365061  0.01079642  0.00694562]
     [ 0.01496788  0.009817    0.01339064]
     [ 0.01265883  0.0220863   0.0068616 ]]]]]]

Trunk (same indices)
[[[[[ 0.00943607  0.00809061  0.01256763]]]]]
[[[[[ 0.00437729  0.0137312   0.00169505]]]]]
[[[[[ 0.00697001  0.00556027  0.00178896]]]]]
...
[[[[[ 0.00723916  0.00807445  0.00731593]]]]]
[[[[[ 0.00440746  0.00999618  0.00841008]]]]]
[[[[[ 0.00861919  0.01177055  0.00994249]]]]]

Removing the randomness:

ground_motion_distribution.py - Setting gm_rvs to a function that returns an 
array of ones

def ones_rvs(size=1):
    return ones(size)

gm_rvs = ones_rvs

Results

bedrock_SA

Branch
[[[[[[ 0.01307938  0.03037735  0.02230157]
     [ 0.00406271  0.00898623  0.00303318]
     [ 0.00552175  0.01210356  0.00229263]
     ..., 
     [ 0.01051712  0.02349786  0.02620559]
     [ 0.01282724  0.02933567  0.01456809]
     [ 0.01424081  0.03228348  0.01462302]]]]]]

Trunk (same indices)
[[[[[ 0.01307938  0.03037735  0.02230157]]]]]
[[[[[ 0.00406271  0.00898623  0.00303318]]]]]
[[[[[ 0.00552175  0.01210356  0.00229263]]]]]
...
[[[[[ 0.01051712  0.02349786  0.02620559]]]]]
[[[[[ 0.01282724  0.02933567  0.01456809]]]]]
[[[[[ 0.01424081  0.03228348  0.01462302]]]]]

Overall results
Diff reveals no change

Original comment by b...@girorosso.com on 20 Jan 2012 at 3:20

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

On why the new implementation_tests scenarios pass and this one does not.

TS_haz40
TS_haz41 - atten_variability_method = None (no sampling)

TS_risk64 - atten_variability_method = 4 (+1 sigma, no random sampling)

TS_fat01 - atten_variability_method = 2 (random sampling, but every value is 
set to zero so no affect)

Original comment by b...@girorosso.com on 20 Jan 2012 at 3:50

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

As an aside, check_scenarios.py, as of revision 882, uses a replacement for 
norm.rvs that supplies predictable values. This is also a reason for not 
picking this up in implementation tests.

Note - test_rvs.py only supplies an array of size 100. This means that all 
scenarios with atten_variability_method = 2 must have an event_set size of 100 
or less.

Original comment by b...@girorosso.com on 20 Jan 2012 at 4:31

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

case_studies/test_national/large_coverage.py uses atten_variability_method = 2 
(random sampling). 

As shown in the comment 1 if you take randomness out of the sampling the 
results are the same. This is because the event_set sizes when sampling are 
different. If any other atten_variability_method is used then the results are 
the same. This issue is due to random sampling, not a logic error.

Adding implementation test large01.py that uses the same data as 
large_coverage.py but does not use random sampling. This is to exercise the 
code with a large dataset so that any logic changes will not cause this results 
from this scenario to deviate over time.

Original comment by b...@girorosso.com on 22 Jan 2012 at 11:25

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Revision r907 (on branch memory_usage) adds the implementation test.

Original comment by b...@girorosso.com on 22 Jan 2012 at 11:32

Changed state: Fixed
Added labels: ****
Removed labels: ****

GeoscienceAustralia / eqrm

A large event_set scenario reveals different results between trunk and branch memory_usage #4