VisionEval / VisionEval-Dev

Development version of VisionEval framework
https://visioneval.github.io/
Apache License 2.0
6 stars 32 forks source link

splitInt() function in AssignLocTypes can fail to correctly allocate right number of dwelling units #144

Closed dflynn-volpe closed 1 year ago

dflynn-volpe commented 3 years ago

Here is the splitInt() function in the AssignLocTypes module:

# Define a function to do a whole number splitting according to proportions
splitInt <- function(Props_, Tot) {
  SplitAll_ <- Props_ * Tot
  SplitInt_ <- round(SplitAll_)
  Rem <- sum(SplitAll_ - SplitInt_)
  if (Rem != 0) {
    RemTab_ <- table(
      sample(1:length(Props_), abs(Rem), replace = TRUE, prob = Props_)
    )
    SplitInt_[as.numeric(names(RemTab_))] <-
      SplitInt_[as.numeric(names(RemTab_))] + sign(Rem) * RemTab_
  }
  SplitInt_
}

This works by feeding a dataframe of proportions of land types 'Lt' (urban, town, rural) by housing type 'Ht' (single family, multifamily, group quarters) for each Bzone 'Bz'. This is stored in the 3-dimensinal array Props_BzHtLt. Those proportions are then applied to the vector describing the number of housing types for each Bzone, stored in the 2-dimensional array DU_BzHt_full, here.

  # Calculate dwelling units by Bzone, housing type and location type
  DU_BzHtLt <- sweep(Props_BzHtLt, c(1,2), DU_BzHt_full, splitInt)

For one particular set of inputs I'm working with, this is now failing to correctly apply the splitInt, resulting in a mismatch between the length of the vector of names and housing IDs. The error occurs in the double loop here in AssignLocTypes.R.

Problem: The resulting DU_BzHtLt for this Bzone and housing type only sums to 984, but there are 985 dwelling units in this Bzone for the SF housing type. First, showing how it is supposed to work (and does, when manually feeding in data):

Props_ = c(0.0, 0.9, 0.1)
names(Props_) = c("Urban", "Town", "Rural")

Tot = 985

result <- splitInt(Props_, Tot)

> result
Urban  Town Rural 
    0   887    98 

sum(result) # 985

So what's the issue?

# Here are the actual inputs
> Props_BzHtLt[228,1,]
Urban  Town Rural 
  0.0   0.9   0.1 
> DU_BzHt_full[228,1]
[1] 985

# Now we feed these to splitInt()
Props_ = Props_BzHtLt[228,1,]
Tot = DU_BzHt_full[228,1]

result <- splitInt(Props_, Tot)

> result
Urban  Town Rural 
    0   886    98 

sum(result) # 984 !!!
jrawbits commented 3 years ago

It's almost certainly a rounding error. I'll add some internal cross-checking...

dflynn-volpe commented 3 years ago

Yep, here's a possible solution (using the real inputs):

# Here are the first few lines of splitInt():
SplitAll_ <- Props_ * Tot
SplitInt_ <- round(SplitAll_)
Rem <- sum(SplitAll_ - SplitInt_)

# Rem = 1, meaning that we are off by 1 and need to sample from the proportions to assign one DU to a land use type
sample(1:length(Props_), abs(Rem), replace = TRUE, prob = Props_)
# This is the problem, it produces no result:
# integer(0) 

If we add a rounding step to ensure that our remainder is an integer, this succeeds:

Rem = round(Rem, 0)

sample(1:length(Props_), abs(Rem), replace = TRUE, prob = Props_)
# [1] 2

I will push a fix to development after testing that this works.

dflynn-volpe commented 3 years ago

Slightly different. The right way to think about the inputs to splitInt() is that the 2-D array DU_BzHt_full is applied to each 2-D slice of the array Props_BzHtLt (one slice for each land use type).

The problem is that in the particular inputs I'm using, the remainder Rem for the Town slice is not an integer! It comes out to 2.5. The sample() function requires the size parameter (Rem) to be an integer. This might mean re-configuring this step so that it applies the number of dwelling units by land use type separately for each Bzone; currently it is set up to go across Bzones, within each land use type. AssignLocTypes_TestData.zip

dflynn-volpe commented 3 years ago

Fixing in patch_splitInt

dflynn-volpe commented 3 years ago

Closed by 47af0b4

jrawbits commented 1 year ago

Reopening this based on failure in H-GAC 5000 BZone model: when the probabilities are equal (0.5 each in Urban and Town) and a small number of units is presented (3), the result rounds up and yields a wrong total. The difference is small (2 units) but trashes the tapply that comes up next since the LocType_Hx vector has an inconsistent length. I'll close this again once I've fixed that (probably simplest to enforce matching the requested total and just manually increment/decrement non-zero results to get it to match up)

jrawbits commented 1 year ago

Fix is in the pipeline

jrawbits commented 1 year ago

Fixed in development-next push 2023-03-24