Closed ethervoid closed 7 years ago
@ethervoid, it's difficult to sort the US geometries because they don't always perfectly nest into one another. I came up with this chart below to help describe how they should fit into one another. Only column B, which contains nested TIGER Geographies, makes sense to sort. They also fully cover all of the United States.
https://docs.google.com/spreadsheets/d/1pp88VPvfRYyjAy_zUJaXnbFPLG_5yPvIF8Z6dB2Cf7o/edit#gid=0
I based it off this hierarchy chart: https://www2.census.gov/geo/pdfs/reference/geodiagram.pdf
However, if you want to sort simply based on count, then your method makes sense and is easiest to understand!
@michellemho Yes, it's really hard hehehe. I've found that they came from different departments and so on but we need to provide a, more or less, way to sort them from top to bottom.
I don't know if it makes sense but the UI doesn't split them into different categories.
Another question I have. What is the purpose of the weight
column? I've checked in the geometries column for different countries and sometimes it seems to be used as a sort column, others (US for example) dunno what purpose it has.
If we could use weight
to sort the geometries would be amazing so we avoid to create a new attribute for all the geometries we have
@ethervoid, I believe the weight column is an old feature... I don't think it's actually used anywhere. I think it was a method for sorting by hardcoding the order. This might be helpful for showing them sorted in the Catalog and the UI!
@michellemho Great!! I'll start using it to hard code the order for the geometries. Maybe you have a better suggestion to sort them but I think it's very hard to do it automatically because involve some data, which is beyond the data we have, to make a proper sort function.
@ethervoid, for purposes of the "slider" in the UI, we should remove all geography options that are not full coverage of the United States. Or else, there are strange and unexplained null results.
Full coverage geometries (also tagged with "interpolation_boundary" in the DO)
State PUMA County Census tract Block group Block
We should talk about ZCTA. In my opinion, ZCTAs are a popular method for dividing and representing the United States. However, they are not full coverage (for example, holes exist for large bodies of water and national parks) according to this: https://www.census.gov/geo/reference/zctas.html.
If we include ZCTAs (and I think we should), we should also include the definition of these boundaries in the slider somehow.
@michellemho Good point. We could add that and mark the rest with weight
0 which if I'm not wrong removes them from the metadata.
I'll include ZCTA too and as you said maybe we should add an icon which shows an explanation of the possible geometries but it'll hard to extrapolate for all the countries
In summary, we're going to remove geometries without full coverage in the United States which are:
This will help us to avoid NULLs in places where we don't have coverage for that geometry.
Special case is the ZCTAs which are very popular
What do you think @saleiva @noguerol @kevin-reilly?
// @juanignaciosl @javitonino for awareness
Sounds good to me. I think in general we should try to have less but better data. Ensuring this works 99.9% of the times is key IMO
I think we should leave Congressional Districts if we can.
The Congressional Map fully covers the US. I'm not aware of any are that is not in a congressional district.
-- Kevin Reilly SVP Product CARTO https://carto.com/ | kevin@carto.com kevin@carto.com (917) 375-2168
On Tue, Jul 11, 2017 at 12:12 PM, Sergio Álvarez Leiva < notifications@github.com> wrote:
Sounds good to me. I think in general we should try to have less but better data.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/bigmetadata/issues/193#issuecomment-314494833, or mute the thread https://github.com/notifications/unsubscribe-auth/ACy_X1cANvWoZDKka8VfUhtH3DWSaTgXks5sM570gaJpZM4OUHx6 .
Agreed, I could not find any evidence that Congressional Districts are not full coverage of the United States. I'll add an interpolation_boundary
tag for those as well!
https://observatory.carto.com/tables/obs_664d0ac6cc93a8f8d83553cb454738843132cb4c/public/map
I was thinking the same last night, that we Congressional Districts should have full coverage too so great! :)
@michellemho interpolation_boundary
is the same that fullcoverage_boundary
? if not what is the meaning? (I'm a bit off :) ).
I think we could filter in the UI and show only the interpolation boundaries
instead of removing the geometries from the catalog.
Also, now a begin to understand the use of the tiles and scores that John include.
@ethervoid
cartographic_boundary
--> everything that is shoreline clipped (and looks good on a map, hence "cartographic")
interpolation_boundary
--> clipped boundaries that are full coverage of the country/region
PR #196
Related CartoDB/observatory-extension#295
I've been investigating the US geometries to try to sort them and this is what I came up with:
please @michellemho take a look and if it makes sense I'll add a new column to make it possible to show them sorted in the UI
Used resources: https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html https://www.wikipedia.org/ https://github.com/CartoDB/bigmetadata/issues/166 http://greatdata.com/pdf/Doc-CBSACodes.pdf https://www2.census.gov/geo/maps/metroarea/us_wall/Jul2015/cbsa_us_0715.pdf https://www.quora.com/How-many-census-tracts-exist-in-the-2010-U-S-census?share=1