Hack4Eugene / SpeedUpAmerica

Crowd-sourced internet speed tests using M-Lab data and user tests on a website, with charts, maps, and raw data downloads.
MIT License
25 stars 9 forks source link

Add Texas and improve documentation for the process (3SP) #195

Open ryanrolds opened 5 years ago

ryanrolds commented 5 years ago

Is your feature request related to a problem? Please describe. We need to slowly add states and monitor performance. It's been a month since we added WA and ID and we maintain the pace. At this time we have a very basic section in our vector tileset documents on loading boundaries.

Describe the solution you'd like

Additional context If you have any questions ask Ryan. You shouldn't need the BigQuery service key as you won't be loading data from M-Lab, that will be another ticket.

mattsayre commented 5 years ago

After talking with Ryan we have identified an opportunity to prioritize Mississippi. The reason for this is that US Senator Roger Wicker heads the Senate Committee where broadband mapping is being actively discussed ahead of federal legislation.

mattsayre commented 3 years ago

2021 update Texas is interested in being the next state to be added to the national map.

webaissance commented 2 years ago

Hi @mattsayre and @ryanrolds and team, I want to let you know I've been pursuing this task - to add Texas - steadily for more than a week - and I'm making progress.

I think there are a few challenges which are making it take a while - but I'm thinking of a strategy to expedite adding states moving forward. The main challenge is that there is a LOT of data - and as we add states the data keeps growing and growing. Per the document linked above on Vector Tilesets and Boundaries I've been working with the populate_boundaries.rake method. While it's a cool method - it hasn't been working very well for me. What has worked better for me is the method of loading a dataset from a .sql file such as sua_20191022.sql and sua_lane_20191022.sql

So I have an idea - which is to create a rake file that creates a bunch of .sql files - one for each state we want to add - plus a few more for the ancillary data that's needed - such as zips, counties, etc. and then build another rake that would load all of these .sqls into the db.

There are a few advantages to this. For one thing we could inspect the .sql files and make sure the data is all there and formatted correctly. And each one could be reused - so the rake would only have to be run once per state. It could be updated in future years as needed. And It would be more robust because the current populate_boundaries method can break which breaks the whole process. The .sql file for each state would be a manageable size - whereas even the current combined file sua_20191022.sql of Oregon, Washington and Idaho (referenced in README.md) is too large to be managed well - as it is 4.3 GB in size.

I would plan to use the 2021 census bureau data for the tract and tabblock files here: https://www2.census.gov/geo/tiger/TIGER2021/TRACT/ https://www2.census.gov/geo/tiger/TIGER2021/TABBLOCK20/ (this document matches the numbers to the states)

So I want to let you know that I'm making progress and also propose this plan of action. Let me know your feedback on this plan. I will start building it in the meantime - and make adjustments based on your feedback. -Dave @webaissance