CartoDB / bigmetadata

BSD 3-Clause "New" or "Revised" License
43 stars 11 forks source link

Make it possible to sort the geometries #193

Closed ethervoid closed 7 years ago

ethervoid commented 7 years ago

Related CartoDB/observatory-extension#295

I've been investigating the US geometries to try to sort them and this is what I came up with:

please @michellemho take a look and if it makes sense I'll add a new column to make it possible to show them sorted in the UI

Used resources: https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html https://www.wikipedia.org/ https://github.com/CartoDB/bigmetadata/issues/166 http://greatdata.com/pdf/Doc-CBSACodes.pdf https://www2.census.gov/geo/maps/metroarea/us_wall/Jul2015/cbsa_us_0715.pdf https://www.quora.com/How-many-census-tracts-exist-in-the-2010-U-S-census?share=1

michellemho commented 7 years ago

@ethervoid, it's difficult to sort the US geometries because they don't always perfectly nest into one another. I came up with this chart below to help describe how they should fit into one another. Only column B, which contains nested TIGER Geographies, makes sense to sort. They also fully cover all of the United States.

https://docs.google.com/spreadsheets/d/1pp88VPvfRYyjAy_zUJaXnbFPLG_5yPvIF8Z6dB2Cf7o/edit#gid=0

I based it off this hierarchy chart: https://www2.census.gov/geo/pdfs/reference/geodiagram.pdf

However, if you want to sort simply based on count, then your method makes sense and is easiest to understand!

ethervoid commented 7 years ago

@michellemho Yes, it's really hard hehehe. I've found that they came from different departments and so on but we need to provide a, more or less, way to sort them from top to bottom.

I don't know if it makes sense but the UI doesn't split them into different categories.

Another question I have. What is the purpose of the weight column? I've checked in the geometries column for different countries and sometimes it seems to be used as a sort column, others (US for example) dunno what purpose it has.

If we could use weight to sort the geometries would be amazing so we avoid to create a new attribute for all the geometries we have

michellemho commented 7 years ago

@ethervoid, I believe the weight column is an old feature... I don't think it's actually used anywhere. I think it was a method for sorting by hardcoding the order. This might be helpful for showing them sorted in the Catalog and the UI!

ethervoid commented 7 years ago

@michellemho Great!! I'll start using it to hard code the order for the geometries. Maybe you have a better suggestion to sort them but I think it's very hard to do it automatically because involve some data, which is beyond the data we have, to make a proper sort function.

michellemho commented 7 years ago

@ethervoid, for purposes of the "slider" in the UI, we should remove all geography options that are not full coverage of the United States. Or else, there are strange and unexplained null results.

Full coverage geometries (also tagged with "interpolation_boundary" in the DO)

State PUMA County Census tract Block group Block

We should talk about ZCTA. In my opinion, ZCTAs are a popular method for dividing and representing the United States. However, they are not full coverage (for example, holes exist for large bodies of water and national parks) according to this: https://www.census.gov/geo/reference/zctas.html.

If we include ZCTAs (and I think we should), we should also include the definition of these boundaries in the slider somehow.

ethervoid commented 7 years ago

@michellemho Good point. We could add that and mark the rest with weight 0 which if I'm not wrong removes them from the metadata.

I'll include ZCTA too and as you said maybe we should add an icon which shows an explanation of the possible geometries but it'll hard to extrapolate for all the countries

In summary, we're going to remove geometries without full coverage in the United States which are:

This will help us to avoid NULLs in places where we don't have coverage for that geometry.

Special case is the ZCTAs which are very popular

What do you think @saleiva @noguerol @kevin-reilly?

// @juanignaciosl @javitonino for awareness

saleiva commented 7 years ago

Sounds good to me. I think in general we should try to have less but better data. Ensuring this works 99.9% of the times is key IMO

kevin-reilly commented 7 years ago

I think we should leave Congressional Districts if we can.

The Congressional Map fully covers the US. I'm not aware of any are that is not in a congressional district.

-- Kevin Reilly SVP Product CARTO https://carto.com/ | kevin@carto.com kevin@carto.com (917) 375-2168

On Tue, Jul 11, 2017 at 12:12 PM, Sergio Álvarez Leiva < notifications@github.com> wrote:

Sounds good to me. I think in general we should try to have less but better data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CartoDB/bigmetadata/issues/193#issuecomment-314494833, or mute the thread https://github.com/notifications/unsubscribe-auth/ACy_X1cANvWoZDKka8VfUhtH3DWSaTgXks5sM570gaJpZM4OUHx6 .

michellemho commented 7 years ago

Agreed, I could not find any evidence that Congressional Districts are not full coverage of the United States. I'll add an interpolation_boundary tag for those as well!

https://observatory.carto.com/tables/obs_664d0ac6cc93a8f8d83553cb454738843132cb4c/public/map

ethervoid commented 7 years ago

I was thinking the same last night, that we Congressional Districts should have full coverage too so great! :)

@michellemho interpolation_boundary is the same that fullcoverage_boundary? if not what is the meaning? (I'm a bit off :) ).

I think we could filter in the UI and show only the interpolation boundaries instead of removing the geometries from the catalog.

Also, now a begin to understand the use of the tiles and scores that John include.

michellemho commented 7 years ago

@ethervoid cartographic_boundary--> everything that is shoreline clipped (and looks good on a map, hence "cartographic") interpolation_boundary --> clipped boundaries that are full coverage of the country/region

ethervoid commented 7 years ago

PR #196