Open paulxiep opened 1 year ago
Long version of what was already done.
Step 1. I extracted Vic3 states data from its files. I have state pops (pop is target variable), state arable resources, capped resources, state region, state buildings, state terrains and number of coastal provinces. These are features I've tried feeding into Linear Regression model so far. In the end feeding only arable resources, fish and logging, and region, gives the best (most predictive) result so far. (I could try other features yet, such as state traits, impassibility, latitude, actual area of state, etc. I could also try more of what is called 'feature engineering'. I'll try them when I have time.)
So this csv is an example of an input to the model. The equation I'll give you will also need input similar to this. Possible arable resource and region need to be multiplied by the state's arable land. sample_df.csv
Step 2: I did what is called the 'leave one out cross validation'. Basically I withhold 1 state, use the rest to train the model, and test it by predicting that 1 left out state. This is repeated for each and every state. Each time I recorded whether the prediction is 'correct', 'too high', or 'too low'. By 'correct' I mean correct within acceptable threshold. Say if model predicts pop of 100k, then the threshold of 20% would consider the prediction correct if real Vic3 pop is between 80k to 120k.
For each threshold value, I overlay the prediction on provinces.png in \Victoria 3\game\map_data. I color the state as grey for correct prediction. White for when actual Vic3 pop is higher than prediction, and black for when Vic3 pop is lower.
Basically these map images give us a clue of when state population would deviate from baseline equation. Unsurprisingly, empire centers like London, Paris, Beijing have higher pop than its potential food resources would suggest. I'll be giving you these maps at different threshold, so you can decide for yourself which factors from EU4 save or base geography should influence the deviation from the norm. (Obvious deviation factors are development, being center of empires, being early/late colonies, being impassible in EU4. Less obvious are being along the path of the Rhine, the Indus, and the Congo. You can decide which factors you want to include/exclude)
Example of one such map, at 20% threshold
The baseline equation will be given on next comment. More result images will also follow.
Coefficients and intercept of equation (best one so far - better one will come if I manage to) is given here in json format. (seems github doesn't support json uploading, so I'll save as .txt, but it's actually json format) vic3_baseline_pop_equation.json.txt
So the way to use it is you make your code generate something like this:
baseline_pop = building_subsistence_farms*3378 + building_subsistence_orchards*2123 + ... - 21281
(The last one is called 'intercept', which is where your line crosses Y-axis. This one is not multiplied by anything.)
So to use 1st entry in sample.csv above as an example, the equation would do
libya_baseline_pop = 36*3378 + 36*174 + 36*0 + 8*1082 + 36*12494 - 21281
And this would be correct within 30% threshold.
Here is the map of prediction results for 10%-50% thresholds world.zip
btw old world prediction is way more reliable than new world. As you can see new world is mostly white and black.
Code (Python) used to produce these maps and equation will be subsequently uploaded to my github. (give me a couple days to clean up)
I did some linear regression on vanilla Vic3 data, to find that vanilla Vic3 population is mostly dictated by its potential food and logging resource, with a couple edge cases when the state is empire capital or being impassible in EU4.
So I have for you an equation for a baseline pop count a state/substate should have. This baseline can be further added/multiplied by factors such as development from EU4 save, being country capital, and more. This should allow you to better split population within same state between substates, and better distribute population across states too, while being physically plausible, and without being bound by vanilla Vic3 pop.
This is the intro of what this issue is about. I'll add more details and files later.