Open Hussein-Mahfouz opened 2 months ago
In the current SPC dataset, the working-age population in West Yorkshire is 1,496,784, of which 597,873 people have the workplace assigned.
For West Yorkshire, the total employment recorded in the Business Register and Employment Survey (by MSOA) is 1,025,985, which should be the target number for assigned workplaces.
Possible reason for unmatching
pwkstat
. 32% of their pwkstat
is Employee FT, 16% is Student, and 16% is self-employed.
-359,578 out of 913,918 people have the salary_hourly
and salary_yearly
attributes, which means they should have been assigned a workplace.sic1d2007
; 69% of them are Students, which is reasonable.I guess the main reason is when generating the 'job market', the proportion of the sic1d2007 cannot match the numbers in the Business Register and Employment Survey, which causes the part of jobs in each sector to be unmatched even though the overall job number is similar. I plotted the figure for the number of jobs in each sector in the Business Register and Employment Survey versus the number in SPC, which could prove this situation. I believe that could be the main reason for the unmatched workplaces.
Thanks @BZ-BowenZhang for the update on this, it's very helpful to see the distributions of the two datasets.
Notes from today's meeting:
Thanks for adding this @Hussein-Mahfouz.
Adding notes from discussion with @BZ-BowenZhang for options with increasing complexity:
duresmc
[Government Office Regions and former Metropolitan Counties] and dgorpaf
[Government Office Regions], see TUS). This is more complicated, further upstream in the current pipeline and may have small sample size and other characteristics involved in the matching reducing the variation in time use for a given region.
@Hussein-Mahfouz for reference
Update on 17th July:
The new SPC dataset without SIC code assigning has been tested, and the matching results are slightly improved:
Previous 597,873 assigned, 898,911 unassigned Now 656,296, assigned, 840,488 unassigned
There is still a gap between the current number and the target number from the Business Register and Employment Survey (1,025,985). The mismatches in the SIC code have not been resolved, so further checks of the matching process may be needed.
In the SPC, not all people with a job are assigned to a workplace. As a result, not all people with a job have a "commute" trip. The SIC codes could be very useful for assigning people to workplaces in our model - is there an issue with the SIC codes, and do they need editing?
Enriched Spenser <> TUS matching logic:
Individuals from Spenser are matched to the TUS based on
age35g
,sex
, andnssec
: see the findTUSmatch() functionAttributes of the matched individual, including sic1d2007 and sic2d2007 are taken from the matched individual
If a person in the enriched spenser dataset has a combination of
age35g
,sex
, andnssec
that does not exist in the Time Use Survey, they are then matched onage35g
andsex
only. Things to think about:age35g
column was relaxed. Breaking down the age into 35 groups is very granular. Less groups = Higher matchingSPC commuting location assignment logic
workplaces are assigned based on SIC codes which are obtained from the time use survey
In the SPC commuting flows logic, workers are assigned to jobs based on SIC. Not all workers are matched, and it seems like there is an acceptable threshold for matching before ignoring the SIC
Our approach to workplace assignment (TODO)
We can use SIC codes as done in spc, but have a fallback logic if SIC code does not exist
@sgreenbury could you please take a lot and edit if it doesn't make sense?