ggtracker / ggtrackerstack

Project to run the whole ggtracker stack in vagrant
20 stars 10 forks source link

Economic stats and benchmarks are wrong #44

Open dsjoerg opened 8 years ago

dsjoerg commented 8 years ago

It used to be that 640 mineral income was a fully-saturated base. Now it's 896.

Everyplace in the system that uses 640 and multiples of 640 should be updated. For example: http://ggtracker.com/economy_stats#?race=protoss&vs_race=zerg

And the parallel changes should be made for gas.

And all the same goes for TheStaircase-related code.

nickelsen commented 7 years ago

I've been hovering around this for a while, but it feels close to the core and like a big task.

However, when I searched for references to 640 in the code, I only found these: ggpyjobs: https://github.com/dsjoerg/ggpyjobs/search?utf8=%E2%9C%93&q=640 ggtracker: https://github.com/dsjoerg/ggtracker/search?utf8=%E2%9C%93&q=640

@dsjoerg What do you think - would those be the main entrypoints for the fix or is there something closer to the core that I'm missing?

I changed 640 to 896 in https://github.com/dsjoerg/ggpyjobs/blob/master/sc2parse/sc2reader_to_esdb.py#L1266 and ran the tests, but nothing failed.

dsjoerg commented 7 years ago

I think those are the right entrypoints for the fix. However I'm not sure that the 896 is actually correct.

Once we have the correct number, while changing the number in the code is trivial, then the old stats should either be thrown out, or they should be recomputed on a bunch of old games.

I suppose the lazy good-enough thing would be to update the number in the code to whatever the new number should be, wait a week, and then recompute the economic stats using only the new data. During the Week of Badness I can let people know that the numbers are being adjusted.

So, the remaining question is, what should be number be? Best way is to play a test game vs AI, saturate a single base and then watch in the replay to see what the income number is. Hopefully GGTracker reports the same income number and then we can use that.

However, in the past the numbers have been different on different maps, because some bases have slight variations in mineral patch positioning. So better still would be to run this experiment on each map in the current pool.

Another way to go is to look through a bunch of existing replays where a player stayed on one base and observe the max mineral income rate.

Can I leave to you @nickelsen the task of determining the right max mineral income rate to use?

nickelsen commented 7 years ago

Yes, I'll give it a go. Thanks for the insights and suggestions - very useful.

I also found this awesome post about the theoretical saturation rates: http://www.teamliquid.net/forum/starcraft-2/501306-comprehensive-lotv-production-spreadsheet which I'll compare the empiric results to.

I'll try and make it in small steps. I think this back-shifting has to be changed also, which might have further reaching impact. https://github.com/dsjoerg/ggpyjobs/blob/f21829954f16e5e838957a4b30c6a2e611c429b1/sc2parse/sc2reader_to_esdb.py#L220

dsjoerg commented 7 years ago

Ah, good old TL. Speaking of TL, maybe it's best to ask the community who are actively trying to use this economic benchmarking system. I posted the question on TL here: http://www.teamliquid.net/forum/sc2-strategy/374400-thestaircase-an-alternative-improvement-method?page=76#1511

I'll check in tomorrow and see if they have any concrete suggestions.

Regarding the code that you identified (https://github.com/dsjoerg/ggpyjobs/blob/f21829954f16e5e838957a4b30c6a2e611c429b1/sc2parse/sc2reader_to_esdb.py#L220), it might be easier to leave that code untouched and instead adjust accordingly the number that we use for "one base mineral income" to be in the same scale/units.

dsjoerg commented 7 years ago

@nickelsen JaKaTaK, maker of TheStaircase (the system that uses the economic benchmarks heavily) would love to speak with you, his email is jakataksc2 at gmail.com.

nickelsen commented 7 years ago

Awesome - I'll reach out to him. Thanks a lot for the connection!

nickelsen commented 7 years ago

I recorded a bunch of replays with 16 workers on one-base minerals (2 on each patch) and 6 gas workers (3 on each) on each map in the current 1v1 ladder pool.

Looking at the rates in-game, the benchmarks appear to be 870 mineral/minute (640 * 1.36) and 310 gas/minute (228 * 1.36). As far as I can infer, the HotS benchmarks were lower bounds, since that actual rates vary quite a bit (from 617 to 740 minerals per minute). I guess lower bounds make sense for the benchmarks, since we need to find the first time the income rates reflect worker saturation, which should not be delayed due to variability in the worker movement patterns.

However, I keep wondering - if we don't touch the back-shifting code (https://github.com/dsjoerg/ggpyjobs/blob/f21829954f16e5e838957a4b30c6a2e611c429b1/sc2parse/sc2reader_to_esdb.py#L220), do we need to re-scale the numbers used to find the saturation benchmarks (i.e. the numbers here: https://github.com/dsjoerg/ggpyjobs/blob/f21829954f16e5e838957a4b30c6a2e611c429b1/sc2parse/sc2reader_to_esdb.py#L1266)? The nightly calculated benchmarks are based on the avg saturation times which are based on the time the income rate hits the one-base income rate goal - and since the income rates are already scaled (to HotS time by the back-shifting), the one-base income goals should stay in HotS time, right?

I think we need to change the numbers on the pages on economic benchmarks and the income graphs (data series), but not the goals.

Let me know what you think.

dsjoerg commented 7 years ago

With the replays you recorded with 16 workers on one base, the mineral income rate was around 870 minerals/minute? I wonder whether we should count "saturation" as 16 workers or 24. That would be better answered by JaKaTaK. If he says 16, then I agree with your final conclusion in your comment above.

nickelsen commented 7 years ago

Yes, mineral income rate varied from 867 to 965 when 16 workers mined on one base.

JaKaTaK already answered your question on TL, which is what I used to benchmarking, so everything is in line. :-) http://www.teamliquid.net/forum/sc2-strategy/374400-thestaircase-an-alternative-improvement-method?page=76#1513

I think I have the changes ready for the description texts in the front - I'll make a pull requests for those.