Open brockfanning opened 6 years ago
@Kali2017SDG @SmithersA @philipashlock I've put up an ongoing proof-of-concept using D3 on my fork, using the test data that Kali provided for 8-1-1. You can see it by clicking on the "Map" tab after going here.
A few notes:
Let me know if run into any trouble testing it. As always, feedback is welcome.
Hi @brockfanning just wondering what did you use to produce the test map for 8.1.1.?
@AnnCorp That was done with D3. I have more work to do to make it truly nation-agnostic, but eventually it should be theoretically usable in the UK platform. It relies on the same data format (tidy) that you use. If you'd like a sneak peak at the code, the relevant files would be:
Hi @brockfanning just wondering how things are going with this? Hoping the UK NRP developers will be exploring this kind of thing very soon and so planned to point them at this ticket and specifically your proof of concept http://brock.tips/sdg-indicators/8-1-1/. Anything else you think should be flagged up at the moment? Any further advice or information would be very welcome - thank you!
Hi @AnnCorp it's going well. I've got the proof-of-concept more abstracted now, so that the country-specific code is decoupled from the general mapping code. At this point I've stopped working on the functionality and have turned to getting more subnational data for the US. Here are the indicators done so far:
One statistical/tech note I wanted to mention about these - on some I noticed that extreme outliers could skew the coloring system in a way that made the choropleth map less useful. One example was North Dakota in 2012 on 8-1-1, and another example was District of Columbia for most of the years on 3-3-1. Because these outliers were so much higher than all the other regions, the map pretty much turned into only 2 colors: one for the outlier, and one for everything else.
The approach I've got in place on these proofs of concept is, for the purposes of the coloring system, to ignore any regions whose values are outside of 3 standard deviations from the mean. Here is the code that does that. The outliers are still displayed accurately on the map, but their values aren't included in the color legend.
Explanation of North Dakota outlier (Survey of Current Business, July 2013. p.116... note: statistics have been revised since this initial release via incorporating available source data): Mining has increased in importance in North Dakota’s economy as a result of the oil boom due to the recovery of oil from the Bakken region’s shale formation; in 2009, mining accounted for 3.5 percent of North Dakota’s current-dollar GDP, and in 2012, mining’s share had nearly tripled, accounting for 9.6 percent of the state’s current-dollar GDP.
@Kali2017SDG Ah, good to know!
@Kali2017SDG @philipashlock This is a write-up after some investigation into open-source options for adding subnational features to this platform. No action needed, just putting this up to help start the conversation, in case you'd like to consider an open-source vendor-less solution.
The requirements
Any subnational solution (as far as I can conceptualize it) needs to minimally satisfy 3 requirements:
1. Data collection
Requirement 1 above is the area I have the least insights into, so for the purposes of this write-up, I’m going to naively assume that we have access to all the data we need (hooray!).
2. Data management
Meeting requirement 2, a data management solution, should start with the basic question: Can we continue to use Github/Prose for data management, or does the introduction of subnational data require another approach?
This is a tough question, because it touches on administrative concerns like data provider workflows and user access and permissions; and it also touches on technical considerations like client-side performance.
Because “if it ain’t broke don’t fix it” I’m going to proceed with the assumption that we will try to continue using Github/Prose for data management, with the caveat that future requirements related to workflows, access, permissions, and client-side performance may necessitate a switch to a separate data management system.
As for the nitty gritty of the data management in Github/Prose, I’m assuming that it will done using a subfolder-based approach. For example, right now this platform uses a “data” folder, with one CSV file for each indicator. With a subfolder approach, there would be a subfolder for each subnational region (U.S. state), eg: data/state/alabama, data/state/alaska, etc. Each of these subfolders similarly have one CSV file for each indicator.
3. Data visualization
That leaves requirement 3, data visualization. For map visualizations, there seem to be 2 main types of open-source solutions out there: “vector-based”, and “tile-based”.
Tile-based
Tile-based solutions have the advantage of more choices of imagery (such as streets, terrain, satellite, etc.), but they carry an additional “moving part” by requiring the use of a tile server. There are free tile-servers for light use, like Open Street Maps, but it might be a complication.
Vector-based
Vector-based solutions outline the map and fill the regions with color. Since we will presumably be exclusively displaying “choropleth” maps, this is probably all that we need. So, between the 2, I lean towards using a vector-based solution.
Open-source mapping libraries
Here are a few well-maintained javascript libraries for both approaches:
Recommendations and next steps
Any next step would probably be to try a proof-of-concept with one or more of these libraries. As for recommendations, I’d would personally go with either Leaflet or D3, to start. Between those two I lean towards D3, because it doesn’t need a tile server. But ultimately I would recommend which ever one downloads the smallest amount assets needed to get the job done, with the most easily maintainable integration code. That wouldn’t be clear until trying them out.
As always, any feedback is welcome.