VisionEval / VisionEval-Dev

Development version of VisionEval framework
https://visioneval.github.io/
Apache License 2.0
6 stars 32 forks source link

Issue with AssignLocTypes #153

Open gabbyfreeman opened 2 years ago

gabbyfreeman commented 2 years ago

Using release from 05-12, AssignLocTypes is producing an error at line 372:

Error in tapply(L$Year$Household$HhSize, list(L$Year$Household$Bzone, : arguments must have same length

LocType_Hh has extra floater variables, see attached file for comparison between HhId and LocType_Hh names LocType_Hh.xlsx

jrawbits commented 2 years ago

You're almost certainly enjoying both a bug and an input error. The problem you're describing seems not to occur when running the sample VERSPM model, so the core code (probably) works fine with consistent inputs. But it should give you a better message when the inputs are not consistent (that's the "bug" part).

In practice, we've squashed a few bugs where an incomplete set of inputs leads to NA values in a generated vector, which then "vanish" during operations like tapply (or probably, before that when the HhSize or LocTypes are being assigned) leading to vectors that are presumed to have the same length but that don't (because the NA's vanish).

So I'll leave this open as an issue (and request that the issue poster forward us a zip file with the "inputs" and "defs" folders that generate the problem; you can email info@visioneval.org and it will get into appropriate hands). Being able to reproduce the error, we can then generate a more informative error message.

You should check the proportions in bzone_urban-town_du_proportions.csv to make sure that none of the listed proportions for Urban and Town add up to more than one, and also that it has one row for each Bzone and scenario year. Another trickier (but in this case more likely) problem to track down is if some of the Household Sizes are NA, which can happen if the group quarters configuration is misspecified or some other household size target is incomplete.

jrawbits commented 2 years ago

I'm pondering a deeper framework fix that inspects datasets that are generated into a module's "L" parameter (and probably also its Out_Ls returned datasets) and flags any dataset that has NA values (or otherwise ends up "the wrong length" compared to the table requirements - doing the latter will bring us face-to-face with issue #142 because VERPAT writes datasets of two different lengths into the same table). I believe (but need confirmation) that no valid dataset will ever have NA values in it, and flagging that the moment it occurs will help us narrow down investigatio of this sort of bug. The framework will log the error, and we can bring the model run to a halt right at that point (leaving a debuggable Datastore behind).