ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
192 stars 99 forks source link

Adjusting employment data in workplace location model allowing to factor in WFH, in-commuting, etc. #609

Closed aletzdy closed 1 year ago

aletzdy commented 2 years ago

We want to build a functionality in ActivitySim to factor the input employment data down by a set of scaling factors that accounts for workers who work-from-home and workers who in-commute, with the scaling factors varying by zone/district. There are multiple ways this can be achieved:

This approach is faster to implement, but less flexible: We can factor the total size terms (or the total employment in the simulation-based method), by the scaling factors to correct for in-commuting, etc. However, this approach does not have the flexibility of applying the scaling factors only to certain types of employments (in case we do not want, for instance, retail jobs to be changed).

This approach requires minimal code change, but requires a bit more work on the side of the model developer: we may define the scaled employment columns in the land use preprocessor file, then modify the size term calculation UECs to use the scaled employment columns in the workplace location model. All the other models would be unaffected.

This approach requires more code change, but little work on the side of the model developer/user: The user sets the employment columns that should be scaled in the yaml file, and ActivitySim does the calculations itself.

DavidOry commented 2 years ago

Apologies for diving in but not attending the call. I think this is an interesting issue, but do not fully understand from the above description. A few questions:

  1. Employees that work from home exclusively are often not represented in the employment data. By work from home, do you mean episodic telecommuting in which the worker does not commute on the simulation day? Or those that work exclusively from home? Is it assumed that the input employment data represents jobs held by those that work from home exclusively? If so, it is assumed that the work location for these workers is their home location?

  2. What is the context in which this scaling occurs? I assume it is the usual work location model. Is that right?

  3. Can you talk a little bit more about why this is necessary? Provided employment > number of workers, why does it matter if there is excess employment when running a usual work location model (if that is the context)? Is this for cosmetic purposes, i.e., to have workers and jobs match exactly? Or to improve the spatial match?

  4. Where would these scaling factors come from? Presumably the factors for in-commuters would be higher at the boundaries of the region. Would the user be given guidance as to how to derive these? Depending on the definition of work from home and employment, it may be very challenging to develop those scaling factors, with important implications (i.e., applying uniform reductions for work from home across the region could give strange model results when work from home shares are large).

jpn-- commented 2 years ago

@DavidOry no need to apologize for not being on a call, we're trying to get better about collaborating asynchronously and not needing all our collaborators to join meetings constantly.

I do think the WFH concerns here are not that big a deal; there might be spatial variation in WFH but it's probably mostly explainable from modeled factors like income. I think the in-commute is more the target motivation for this, especially for regions that for political reasons have been stuck with a model region that includes significant population or employment just outside the model boundary.

bettinardi commented 2 years ago

I really like David Ory's number 4 point which is linked to point 1.

My thoughts and opinion, which is a model user perspective, not a developer -

  1. While the interactions outside of the model are interesting and important, they are out of the scope of the current tool. So my understanding, is that model users want to input the "correct" employment by zone by employment type, but want to be able to have an input factor, to adjust that "real" number down to the number of workers by industry within the model boundary.
  2. Similarly, we want to be able to flag someone working from home on a given day and remove them from the total employment size term so that there an internal 1-1 match when the model is trying to assign workers to jobs.
  3. After typing this, I would suggest that the user might want to just set the number of external workers entering from each external zone by sector and having the code automatically build the reduced size terms. Similarly, the code should likely be responsible for reviewing the syn pop for records that are working from home on the travel day, and internally adjust down the size term "total" employment by sector temporary field.
  4. After writing that, I think we need to think through what the user is inputting and what is assumed about those inputs in the code - I think this needs a side bar meeting.

I started to type my preferred way forward, but I don't think I have a good proposal yet - without a discussion and a back and forth plan development

DavidOry commented 2 years ago

It may be useful to separate the issue of in-commuters from episodic telecommuters (which I think is what you both mean by WFH). It may be that a single solution is best for both problems, but I think it would be best to start by assuming this is not the case rather than assuming it is.

Re: @jpn-- , spatial patterns of work from home. I think this depends on (a) your definition of work from home (specifically whether or not people who work from home are consuming a job in the socioeconomic data that is not located at their home, which given @bettinardi's response I assume the answer is yes) and (b) the context of the scenario (5 percent telecommuting on the simulation day versus 50 percent). Removing 50 percent of your jobs uniformly over space, even if done by industry, is almost certainly wrong in the long term, as it assumes there is exactly no response from the commercial real estate market to a large decline in demand.

bettinardi commented 2 years ago

If I could just echo your last 5% vs 50% comment - this is where I was starting to get stuck too, and then ultimately couldn't suggest a best path forward without kicking it around with a group of smart people, like this thread. But I don't think we will efficiently solve it in issue comments.

jfdman commented 2 years ago

Thought it would be useful to clarify a few things:

  1. Work from home has a specific meaning in ActivitySim. It refers to workers who regularly work from home and do not have a regular out-of-home work location. It is not referring to workers who may telecommute on any given day instead of going to work. In the past, most folks who regularly worked from home were sole proprietors, tradespersons, and fully remote workers. Typically they would show up in the workers per household numbers/synthetic population but not be included in the employment data at the workplace end, though it would vary somewhat from region to region. Now the share of work-from-home has increased due to the significant increase in a fully remote workforce. The observed share of workers who work from home can be derived from household survey data and/or from Census Journey-to-work data (work from home is tabulated in usual mode to work). Often the share of workers who work from home varies by residence county or district. It tends to be indirectly related to accessibility to work, which is why the work location choice logsum is a key term in the model. I recognize that we are at a point where many employers are 'right-sizing' their commercial space to accommodate an increase in fully remote work-force, and so one must be careful to ensure that the employment data going into the model is an accurate picture of jobs at the workplace rather than remote workers. However, these workers do not choose a workplace location so to ensure apples-to-apples comparisons between work location choice outputs and observed data we recommend giving users the ability to remove workers who work from home from the employment data, at their discretion.

Because work from home is a modeled choice, we could implement some version of Alex's suggestion (3 above) - summarize the shares of workers who work from home, make the assumption that the workers who work from home should be reduced in the same county or district as the workers home location, and apply that percentage reduction. It would be more challenging to this by sector in an automated fashion, since a) the employment sectors between implementations, and b) employment sectors are not necessarily known for the workers or mappable between worker industry\occupation and employment data. But it is also likely that very few workers who work in the retail sector, for example, work from home - so there's an argument for allowing the user to specify these factors by employment type. Just not sure if it is worth the trade-off. Also doing this in an automated fashion implies that the model system must have a work-from-home choice model, so it breaks the modularity of the model system. Not necessarily a deal-breaker, but a consideration.

  1. As Jeff points out, removing workers who in-commute from employment data input to the model is a lot more important than WFH, because it has much more significant spatial variation. For some regions the in-commuting workers can gobble up half or more of the employment in a county or district. Again, the observed share of workers who commute from outside the region can be easily calculated from Census JTW data for the base year or something close to it. Those shares can be held constant into the future, or adjusted as modelers see fit. Ignoring the issue is definitely worse than holding base-year shares constant. So again the idea is to allow the user to specify a reduction in the work location choice size term (and total employment used for work location choice constraint) to ensure that the model doesn't send too many internal workers to workplace zones with a high share of in-commuting workers. I think it is probably harder to identify the in-commuting workers by employment type without resorting to analysis of PUMS data, which can be complicated.

I guess after this long-winded reply, I think keeping everything simple and just specifying one factor by TAZ to apply uniformly to all employment (and size terms) in the work location choice model to account for in-commuting and/or WFH shares is my preference.

DavidOry commented 2 years ago

Thanks @jfdman for clarifying the work from home definition. Building on this, we can then define employment in the socioeconomic data as: a count of people working outside the home by work location zone on a typical weekday. Is that right? If so, giving the user the ability to modify this input in response to a large increase in work from home share makes me nervous. I think many regional planning efforts are going to explore 20 or 30 percent work from home shares (or even higher). This will have a meaningful impact on the commercial real estate market that will have important transportation outcomes (e.g., uniform reductions will decrease transit riderships; land value-based reductions may increase transit ridership). Even with the caveats you note ("at their discretion", "careful"), I think providing this functionality is unwise. In my view, this work should be done carefully and thoughtfully outside the travel model.

guyrousseau commented 2 years ago

Thanks everyone for contributing to this very relevant WFH topic within the context of our post-pandemic regional activity-based travel demand forecasting models, including ActivitySim of course. I would also simply add the definitional concept of a "homeworker", as characterized by the Bureau of Labor Statistics. Speaking of BLS, we're starting to see some preliminary results from the recent American Time Use Survey regarding WFH, see https://www.bls.gov/news.release/atus.nr0.htm

jfdman commented 1 year ago

@DavidOry - Ideally, employment data would be absent of workers who work from home. Particularly post-pandemic employment data, where the work from home shares are much higher than they were in 2019. However, this isn't always the case. Regardless, it is still necessary to reduce the work location choice size terms and constraint targets by non-resident workers, because they consume jobs but they do not consume activity opportunities for resident non-work travel. Therefore there still needs to be an internal mechanism for making this happen. My preference at this time is to keep this mechanism simple - allow the user to specify a percentage of external workers for each TAZ and apply this percentage to the calculated size term and the total employment. I believe this is the procedure currently implemented in DaySim. A user would always be able to implement a more sophisticated (and complicated) approach where they specify two sets of employment data - one set of 'reduced' employment data that could be used for work location choice and one 'full' set for other models, which would not require code changes.

bettinardi commented 1 year ago

Based on Joel's description above and on the call on 10-13-22, I agree with the approach he is bringing. Joel, on the side, I would be very interested to see an example input set for how the user would interface with this feature - maybe it's the DaySim example.

DavidOry commented 1 year ago

@jfdman: agree that doing this for non-resident workers makes good sense. My concerns are limited to extending this idea to account for those working from home.

I do think it would benefit the broader ActivitySim project to be explicit about the definition of employment. Even if obtaining the data is a challenge, the internal consistency of the model requires some definition.

jfdman commented 1 year ago

Closing this issue, addressed via https://github.com/ActivitySim/activitysim/pull/613