LABSS / PROTON-OC

Simulation of recruitment to organized crime
MIT License
4 stars 2 forks source link

Figure out the relationships between socio-economic status, school and work #22

Open nicolaspayette opened 6 years ago

nicolaspayette commented 6 years ago

Niccolo from UCSC wrote to us about this on Sept. 12.

He has a proposal that seems to make sense on a general level, but it's not couched in terms that are easily translatable in the ABM. It also doesn't into account the fact that we already have a job attribution algorithm (roughly based on the Lavezzi paper - https://onlinelibrary.wiley.com/doi/full/10.1111/j.1467-999X.2009.04085.x).

So there is a bit of work to do in adapting this into something that we can use but will still satisfy them. It should be fairly high on my list of priorities, as they've been waiting for a reaction from me for a while, and we may have to request more data from them depending on how we end up structuring this.

Here is what Niccolo wrote:

I thought about what we've talked about yesterday and I devised a relatively compact way to model socio-economic status and work networks. Francesco seems to like it, so here it is.

The three necessary variables are

(e) education = (1=primary, 2=secondary, 3=tertiary+) (ws) work status = (1=unemployed, 2=blue collar, 3=white collar, 4=management) (w) wealth = (1, 2, 3, 4, each one corresponding to specific real-world personal wealth intervals. We also could keep euro brackets as you suggested yesterday).

Each variable determines the next, in this way:

First of all, agents take specific positions within work networks based on their education level. If they are unemployed obviously they are not part of any work network.

e=1 => ws = 1,2; p(ws=1 | e=1) + p(ws=2 | e=1) = 1 e=2 => ws = 1,2,3; p(ws=1| e=2) + p(ws=2 | e=2) + p(ws=3 | e=2)= 1 e=3 => ws = 2,3,4 p(ws=2 | e=3) + p(ws=3 | e=3) + p(ws=4 | e=3)= 1

Work status determines agents' wealth:

ws=1 => w=1 ws=2 => w=2 ws=3 => w=3 ws=4 => w=4

w is also influenced by what financial/material assets the family possesses (+1 if assets are larger than a specific level) and by how many people are in the family (-1 if the ratio of family network members to members with a salary is higher than a specific level).

In the next generation, children's wealth is initially equal to that of their wealthier parent. Their wealth determines their education:

w=1 => e=1,2; p(e=1 | w=1) + p(e=2 | w=1) = 1 w=2 => e=1,2,3; p(e=1 | w=2) + p(e=2 | w=2) + p(e=3 | w=2)= 1 w=3 => e=2,3; p(e=2 | w=3) + p(e=3 | e=3) = 1 w=4 => e=3

Children "complete their education" and enter the workforce in the range of their early adulthood (20-30). Their education level and the presence of open positions in the work networks (or lack thereof) determines their work position ws. Once an agent reaches 30, ws cannot be changed anymore: agents keep the same position (or same-level positions, which is the same to us) for the rest of their career. That includes the unemployed: from 30 on they will only work odd jobs or be on welfare, and their wealth will be low as a consequence. From here on - similar to what you suggested yesterday -, the individual's wealth cannot change anymore (short of changes in family assets or dependent family members). Once the agent reaches a certain age (say 60), they leave any work network they may have been part of (ws does not need to change, as it has no more influence), and one of the (adequately-educated) unemployed agents in the 20-30 age range takes their place.

This way, the simulation is simplified because incomes disappear completely, and education and work position don't change across the agent's lifespan, while wealth needs to be updated first when the agent gets a work position, and then only when other agents in their family networks are born, marry or die (implying a family's assets are largely inherited from generation to generation. That tends to be the rule in Italy, but I'd want to check first. I admit assets are one of the weaker points of this model. For instance should they change in size over time?. I have some data to check on it though). The central variable for socio-economic status is wealth w, in accordance with yesterday's discussion: it completely determines education and influences homophily and criminal propensity.

The model also contains a mechanism for social mobility. Eg a wealth w=2 can provide an education e=1, 2 or 3, so presumably higher or lower than the previous generation's. We may even widen the spectrum of possibility, eg with any w level allowing for any e level, but I'd rather check the stats to see how likely that is before going there.

Through some sort of miracle, I found a large-scale individual-level survey by the Bank of Italy that appears to encompass most of what we need to know for this model. It has a good sample of Sicilians and even of Palermo city residents. Paradoxically, the requirement/education distribution within work network still seems to be the most elusive part of all this. So I'll work on it and ask for your assistance if there is something you may know better than I.

nicolaspayette commented 6 years ago

Oh, and here is the follow up from Niccolo on Sept. 14:

I've started to run the first numbers on the effects of education on work status, and the results go in the expected direction - namely, the higher you level of education, the higher your odds of reaching a high work position and viceversa. The picture is not as clear cut as I hoped though, with something like .4 of college-educated people in Sicily still being (for our purposes) unemployed.

Therefore, as I hypotized it would make sense to include a complete matrix of effects in the SES mechanism, so that

each e level has a probability for each ws level, and so forth.

I also divided secondary-education agents into middle school and high school-educated, as both are large categories and they seem to create different employment patterns.

That creates a 4x4 matrix to distribute work status based on education, and (unless w categories grow beyond four, and hopefully that won't happen) another 4x4 matrix for the distribution of education based on wealth. That seems a really, really economical solution to me.

One way to further reduce calculations in the "employment window" in the 20-30 year range would be to give agents an "ideal ws" iws based on the e distribution matrix when they turn 20. That way they are "primed for life" for a specific ws: they may not be able to find an open position equal to their ideal ws before they turn 30, but they are going to ignore openings with a different ws regardless. For a good many agents, iws=1, meaning they are destined for unemployment (poor guys). These agents are not going to be considered at all when there's an opening in the work networks, and their iws immediately turns into their real life-long ws. Only agents with ws>1 still stay "in the game", and their unemployment status only becomes permanent as they turn 30.

I can't write code, but I can describe what I just said in logical steps if that can help:

Agent turns 20. Their ws goes from undetermined to 1 (ie criminal propensity increases).
Based on the education-to-work status distribution matrix, they obtain an iws score.
    If iws=1 then iws=ws, w=1, and the agent cannot enter work networks anymore
    If iws>1, w stays equal to parent w
An open agent is selected to join a work network only when another agent of the same iws turns 65 and leaves the network. This way there's no need for all open agents to check for open positions at every tick. At that point ws is updated (ws=iws) and w is determined as a consequence.
Agent with ws=1, iws>1 turns 30
iws is updated (iws=1), now equaling ws. w is updated (w=1), and the agent does not enter work networks anymore.

Note that w is updated only when agent ws=iws.

Hopefully that all makes sense to you. Next week I'll provide you the distribution matrixes I mentioned and something more detailed on the financial assets and dependent family members mechanics.

Honestly, I'm not sure if we can use that part. I still need to wrap my head around what he's saying, but at the first reading, it feels like a very different model from what we have. (Or at least from what I have in mind.)

nicolaspayette commented 6 years ago

Note that even if we don't go with Niccolo's proposed scheme, there are a couple of issues we still need to think about:

mariopaolucci commented 5 years ago

Some rotation in the job market makes sense. How do people lose jobs?

mariopaolucci commented 5 years ago

We now have four tables from Niccolò. They are

wealth,gender,edu_level -> rate
work_status,gender,wealth -> rate
wealth,gender,edu_level,rate -> rate

Plus one on a component of c which we will deal with later.

For the first one, the idea should be that one has the chance of getting to an education level at the (cumulate) rate of his or her own category. This modifies school enrolment as follows:

mariopaolucci commented 5 years ago

Implementation of SES

Agents’ wealth score at birth equal their parents’. Parents’ education and wealth determines the agents’ educational attainment across their life (in accordance with the probability distributions available in the Excel annex). If e=2 the agents exits the education system at age [16-18]; e=3 the agent exits the education system at 19-20; e= the agent exits the system at age [23-27].

We have loaded the distributions from USCS, both relative and marginal, with the names:

  edu_by_wealth_lvl
  work_status_by_edu_lvl
  wealth_quintile_by_work_status
  criminal_propensity_by_wealth_quintile
  edu
  work_status
  wealth_quintile
  criminal_propensity

We start from the education level and then we move on to generate the other characteristics of individuals as a consequence:

  set max-education-level pick-from-pair-list table:get group-by-first-of-three read-csv "edu" male?
  set education-level max-education-level
  limit-education-by-age
  ifelse age > 16 [
    set job-level pick-from-pair-list table:get work_status_by_edu_lvl list education-level male?
    set wealth-level pick-from-pair-list table:get wealth_quintile_by_work_status list job-level male?
  ] [
    set job-level 1
    set wealth-level 1 ; this will be updated by family membership
  ]

This sets up a population prior to the creation of the households. In the household creation, the wealth of the family head spreads to the other members:

          let family-wealth-level [ wealth-level ] of item 0 hh-members
           ....
          set hh-members turtle-set hh-members
          ask hh-members [ create-family-links-with other hh-members set wealth-level family-wealth-level ]

When agents exit the education system, they enter the workforce. They are initially unemployed (ws = 1). Depending of their achieved education score, agents will have different probabilities of finding jobs within a given ws categories (in accordance with to the probability distributions available in the Excel annex). Agents stay in the workforce until age [60-65], with increasing probabilities to retire. Once retired, the agents will keep their last w and c_econ_propensity.

and we still have to set up the firms in the new approach.

to setup-employers-jobs
  output "Setting up employers"
  let job-counts reduce sentence csv:from-file (word data-folder "employer_sizes.csv")
  foreach job-counts [ n ->
    create-employers 1 [
      hatch-jobs n [
        create-position-link-with myself
        set education-level-required random (num-education-levels - 1) ; TODO: use a realistic distribution
        set salary max (list 10000 (random-normal 30000 1000))         ; TODO: use a realistic distribution
        set label self
      ]
      set label self
    ]
  ]
end

Here, we have to replace salary by just three levels:

(1 = unemployed/inactive; 2 = blue collar worker; 3 = white collar worker; 4 = manager).

mariopaolucci commented 5 years ago

90919e7

This kinda works now. The jobs are created to respect the general unemployment level:

observer> show count persons with [ job-level != 1 ]
observer: 709
observer> show count jobs
observer: 710

BUT since these are extracted from two different distributions, well, three, the size of companies (from @pausa9 ), the leels inside companies by size, and the distribution of jod positions conditioned on edcuation levels, the levels themselves do not match:

observer> show map [ n -> count jobs with [ job-level = n ]][1 2 3 4]
observer: [0 443 266 1]
observer> show map [ n -> count persons with [ job-level = n ]][1 2 3 4]
observer: [2709 536 135 38]

What to do?