NREL / tyche

https://nrel.github.io/tyche-docs/
MIT License
3 stars 1 forks source link

Sample count affecting results: resolve with a user warning when sample counts < 100 are entered #177

Open tjlca opened 10 months ago

tjlca commented 10 months ago

Different sample count numbers for developing tranche results affect the optimization results.

We tested for Sample_count = 2 and Sample_count = 100 and results were different.

rjhanes commented 10 months ago

A sample count of 2 isn't a relevant use case. If you're seeing substantially different results (completely different research areas being funded) for sample counts of 100, 500, 1000, then we can look into this. But keep in mind this is stochastic optimization: the exact numerical results will differ every time it's run.

rjhanes commented 10 months ago

I'm actually curious now if we can figure out what the minimum viable sample size should be. As the sample size increases, results should stabilize, so we may be able to pinpoint a good sample size that provides stable results without taking too long.

I'm going to assign this issue to me and do a little exploration. Will post results when I have them (target end of December).

tjlca commented 10 months ago
Evaluating Wind Turbine
Sample count 2
Maximum 0.05000163793496892
Minimum 0.03930898348836945
Mean 0.04673141511154751
Median 0.047295373990512854

Evaluating Wind Turbine
Sample count 102
Maximum 0.05074675103140825
Minimum 0.03921125267261145
Mean 0.047102480332126134
Median 0.04736886642500573

Evaluating Wind Turbine
Sample count 202
Maximum 0.0511012432768915
Minimum 0.03938420211860227
Mean 0.0470669057146053
Median 0.04738120646491502

Evaluating Wind Turbine
Sample count 302
Maximum 0.05130641971005632
Minimum 0.038911519232402866
Mean 0.047062919839617044
Median 0.04738236860035083

Evaluating Wind Turbine
Sample count 402
Maximum 0.05144503853535604
Minimum 0.039086028608004995
Mean 0.04709264628160621
Median 0.04738846193549302

Evaluating Wind Turbine
Sample count 502
Maximum 0.05179165990422748
Minimum 0.038710934528358804
Mean 0.04705942933975828
Median 0.047336310874603535

Evaluating Wind Turbine
Sample count 602
Maximum 0.051396795703752135
Minimum 0.039011913038007724
Mean 0.04706472045440291
Median 0.047340776834623864

Evaluating Wind Turbine
Sample count 702
Maximum 0.0515436962889054
Minimum 0.03890275155955834
Mean 0.04705099849697472
Median 0.04732222800067597

Evaluating Wind Turbine
Sample count 802
Maximum 0.051599654984733546
Minimum 0.03894584304500875
Mean 0.047073738619565327
Median 0.047360723405527355

Evaluating Wind Turbine
Sample count 902
Maximum 0.05153122601926441
Minimum 0.03884336758901155
Mean 0.04706970863901063
Median 0.04738398337558049

Evaluating Wind Turbine
Sample count 2
Maximum 0.05009954496999655
Minimum 0.04311308662525879
Mean 0.04703201735956385
Median 0.04684127876243059

Evaluating Wind Turbine
Sample count 1002
Maximum 0.05143032852602267
Minimum 0.03920714299413558
Mean 0.04706686316443779
Median 0.04735554628879526

Evaluating Wind Turbine
Sample count 2002
Maximum 0.05166260301886622
Minimum 0.038780538927572286
Mean 0.04707645044347494
Median 0.04737317100927385

Evaluating Wind Turbine
Sample count 3002
Maximum 0.05157089781174794
Minimum 0.038471216317050784
Mean 0.04707885884691293
Median 0.04736952898802004

Evaluating Wind Turbine
Sample count 4002
Maximum 0.05169310054428345
Minimum 0.038636731546347766
Mean 0.047079561202335904
Median 0.04736988502500272

Evaluating Wind Turbine
Sample count 5002
Maximum 0.05158003782511042
Minimum 0.03837598793852433
Mean 0.0470726621148148
Median 0.0473722165303716

Evaluating Wind Turbine
Sample count 6002
Maximum 0.051589366675181256
Minimum 0.03839646204441669
Mean 0.04707352957592817
Median 0.047373704877490944

Evaluating Wind Turbine
Sample count 7002
Maximum 0.05157564651707769
Minimum 0.038401844232053846
Mean 0.04707340147832284
Median 0.04735733351731602

Evaluating Wind Turbine
Sample count 8002
Maximum 0.05169668162757991
Minimum 0.03862965491007416
Mean 0.04706520775064487
Median 0.047358853973410844

Evaluating Wind Turbine
Sample count 9002
Maximum 0.05158837596216941
Minimum 0.03839905155584951
Mean 0.047068120569336375
Median 0.04736668340164845
rjhanes commented 10 months ago

I'm not seeing anything in the above results to cause concern - even the difference between sample count = 2 and sample count = 1000+ is within rounding distance.

@tjlca Did you see cause for concern in any other decision contexts? If not I'll close this issue. We might want to add guidance on choosing sample sizes to the documentation, to recommend only 100+ draws be used per simulation.

ETA: Pressed the wrong button and closed accidentally! Re-opening until we're sure this isn't a problem.

rjhanes commented 9 months ago

To address: add warning message for sample_count < 100