Closed jtanwk closed 2 years ago
Hi Jonathan,
Thanks for the feedback. Yes, you are correct we could most definitely make it more efficient, even though it does work fine for most use cases (the amount of data geocoded this way per city/county has not been huge so far). Also, since the assignment of census tract by zip code is probabilistic rather than deterministic, we'll have to rejigger some other parts of the code - again, not complex by any means, and likely something that can be addressed in the next release of the tool (New America and/or DataKind can shed more light on when that might be).
@manusharma50 thanks for the quick response! Yes, this is definitely in the nice-to-have category with respect to urgency. But anything that reduces processing time on a local machine is a win in my book.
Ok my friend @jtanwk, your wish has been granted, as per this PR: https://github.com/datakind/new-america-housing-loss-public/pull/16. :)
@manusharma50 super exciting! My DK project team will be very excited by this.
Hello! When I ran the FEAT tool recently, I noticed the Zip-To-Census-Tract lookup repeated the lookup process for the same zip codes multiple times (truncated output below):
Each lookup takes a decent amount of time (a few minutes per zip code at least). From looking at
append_zip_to_tract_data()
it looks like we're doing the lookup for all rows without a GEOID.It might be more efficient if we got a list of all unique zip codes from those rows and only looked each up once, then merged the result back afterwards.