Open chris-prener opened 5 years ago
This would be a great enhancement to add. Many municipalities' taxlot data or building footprint data often contains the number of residential units that are at a location. These are effectively point-level ancillary data, and they could be used to help increase the accuracy of weighting appropriately. The same could be done using LEHD/LODES data, which contains the number of workers at the census block level. Other tests could be performed to see if the spatial distribution of labor force participation is heavily skewed, which might inform whether the LODES block-level data would be insufficient.
Thanks @plnnr! Glad to know this would be useful to you. I'll keep you updated on our development here.
Interesting package, I'm working in ArcMap currently but might have to utilize this soon. As far as the even distribution assumption within census tracts for example, there are arguments for both sides to consider depending on your level of granularity. If working with block group, census tracts, or above you can just utilize the block data to filter out any industrial areas or parks as there population would be 0, Census Tracts have much more conisistent populations compared to other geographies I have seen. Parcel data is more ideal however. I would not advise making an even distribution assumption for legislative related boundaries, as they are manipulated and imbalanced for a reason. They do align with census block lines however so you can utilize this method for weighting the possibility of the event being interpolated.
@bransonf - Also make an implementation as a factory function. Interpolation of say all the building footprints in a state, is a difficult computation. There is a straightforward approach to generating a matrix of source_id, target_id, proportional_overlap
which can save time on future computations. (Could also precompute factory functions for all geographies in the US with a relatively small compressed size, making dasymetric interpolation available to users with less than the requisite memory to load building footprints.)
e.g.
MO_Tract_Building_Grid <- aw_dasymetric(source, sid, dasy, target, tid, factory = TRUE)
> MO_Tract_Building_Grid
function(source_data, source_id)
{
# Dataframe created with dput()
df <- data.frame(
source_id = c(1,2,...),
target_id = c(1,1, ...),
factor = c(0.8, 0.2, ...)
)
joined <- left_join(df, source_id, by = 'source_id') %>%
mutate(value = source_data * factor)
summary <- group_by(joined, target_id) %>%
summarise(value = sum(value)
return(summary)
}
@plnnr Yes it's already in tidycensus: https://walker-data.com/tidycensus/reference/interpolate_pw.html. I also have a section about it in Chapter 7 of my book if you are interested.
Nice @walkerke - I think the pandemic through off me and @bransonf's plans for expansion here. Glad to see this functionality out in the world!
@plnnr Yes it's already in tidycensus: https://walker-data.com/tidycensus/reference/interpolate_pw.html. I also have a section about it in Chapter 7 of my book if you are interested.
Awesome! And I can see there were already bug fixes. Very exciting, thank you immensely for being such a high-volume contributor. Was chatting with someone from MTC (San Fran MPO) and they were rejoicing how useful your packages have been.
Thanks for the kind words @plnnr! I've been using tidycensus::interpolate_pw()
in my projects for a while now and it looks to be working as I expect it - and yes we've already made some improvements to it. If you have any suggestions when you try it out, let me know!
Adding functionality for this approach is a long term goal, and this issue will serve as a venue for discussion / documenting progress on it.