chris-prener / areal

R package for areal interpolation
https://chris-prener.github.io/areal/
GNU General Public License v3.0
92 stars 9 forks source link

Binary dasymetric method #2

Open chris-prener opened 5 years ago

chris-prener commented 5 years ago

Adding functionality for this approach is a long term goal, and this issue will serve as a venue for discussion / documenting progress on it.

plnnr commented 5 years ago

This would be a great enhancement to add. Many municipalities' taxlot data or building footprint data often contains the number of residential units that are at a location. These are effectively point-level ancillary data, and they could be used to help increase the accuracy of weighting appropriately. The same could be done using LEHD/LODES data, which contains the number of workers at the census block level. Other tests could be performed to see if the spatial distribution of labor force participation is heavily skewed, which might inform whether the LODES block-level data would be insufficient.

chris-prener commented 5 years ago

Thanks @plnnr! Glad to know this would be useful to you. I'll keep you updated on our development here.

kendallfrimodig commented 4 years ago

Interesting package, I'm working in ArcMap currently but might have to utilize this soon. As far as the even distribution assumption within census tracts for example, there are arguments for both sides to consider depending on your level of granularity. If working with block group, census tracts, or above you can just utilize the block data to filter out any industrial areas or parks as there population would be 0, Census Tracts have much more conisistent populations compared to other geographies I have seen. Parcel data is more ideal however. I would not advise making an even distribution assumption for legislative related boundaries, as they are manipulated and imbalanced for a reason. They do align with census block lines however so you can utilize this method for weighting the possibility of the event being interpolated.

bransonf commented 4 years ago

@bransonf - Also make an implementation as a factory function. Interpolation of say all the building footprints in a state, is a difficult computation. There is a straightforward approach to generating a matrix of source_id, target_id, proportional_overlap which can save time on future computations. (Could also precompute factory functions for all geographies in the US with a relatively small compressed size, making dasymetric interpolation available to users with less than the requisite memory to load building footprints.)

e.g.

MO_Tract_Building_Grid <- aw_dasymetric(source, sid, dasy, target, tid, factory = TRUE)

> MO_Tract_Building_Grid
function(source_data, source_id)
{
# Dataframe created with dput()
df <- data.frame(
  source_id = c(1,2,...),
  target_id = c(1,1, ...),
  factor = c(0.8, 0.2, ...)
)

joined <- left_join(df, source_id, by = 'source_id') %>%
  mutate(value = source_data * factor)

summary <- group_by(joined, target_id) %>%
  summarise(value = sum(value)

return(summary)
}
plnnr commented 2 years ago

I think @walkerke began to implement this with here.

walkerke commented 2 years ago

@plnnr Yes it's already in tidycensus: https://walker-data.com/tidycensus/reference/interpolate_pw.html. I also have a section about it in Chapter 7 of my book if you are interested.

chris-prener commented 2 years ago

Nice @walkerke - I think the pandemic through off me and @bransonf's plans for expansion here. Glad to see this functionality out in the world!

plnnr commented 2 years ago

@plnnr Yes it's already in tidycensus: https://walker-data.com/tidycensus/reference/interpolate_pw.html. I also have a section about it in Chapter 7 of my book if you are interested.

Awesome! And I can see there were already bug fixes. Very exciting, thank you immensely for being such a high-volume contributor. Was chatting with someone from MTC (San Fran MPO) and they were rejoicing how useful your packages have been.

walkerke commented 2 years ago

Thanks for the kind words @plnnr! I've been using tidycensus::interpolate_pw() in my projects for a while now and it looks to be working as I expect it - and yes we've already made some improvements to it. If you have any suggestions when you try it out, let me know!