dkyleward / ipfr

Generic expansion of seed distribution to marginal targets
4 stars 5 forks source link

Consider making seed/target pairs more general #3

Open dkyleward opened 5 years ago

dkyleward commented 5 years ago

Currently, you can use a primary and secondary seed/target pair. This is well-suited for the household survey work I'm familiar with, but not be adequate for other applications. Even in household surveys, consider if you wanted to control households by size, persons by age, and vehicles by gas mileage. With only 2 seed/target pairs, you would have to choose whether the second seed represented people or autos. Consider the following approach, which would allow for an arbitrary number of seed/target pairs.

seeds <- list()
seeds$households <- tibble(
  # size, income, etc.
)
seeds$persons <- tibble(
  # age, gender, etc.
)
seeds$autos <- tibble(
  # make, model, mpg, etc.
)
targets <- list()
targets$households$size <- tibble(
  # ...
)
targets$persons$age <- tibble(
  # ...
)
targets$autos$mpg<- tibble(
  # ...
)
result <- ipu(seeds, targets)

The above change is pretty drastic, and would definitely be a 2.0-style revision. If doing that, then consider the following, which might make the pairs easier to construct and more obvious.

data <- list()
data$households$seed <- tibble(
  # ...
)
data$households$targets$size <- tibble(
  # ...
)
result <- ipu(data)

The secondary_importance parameter would also have to be generalized (which might be a good thing).

importance = list(
  households = 1,
  persons = .5,
  autos = .3
)