CartoDB / bigmetadata

BSD 3-Clause "New" or "Revised" License
43 stars 11 forks source link

[Data Request] MRLI Data for Cincinnati #539

Closed hannahblue closed 6 years ago

hannahblue commented 6 years ago

Extract of MRLI data for Cincinnati going back 5 years at the Block level. Can be just a CSV output. Alternatively, if we can give Stuart access to test out the Tilesets API he can work with it himself.

cc @stuartlynn @antoniocarlon @ethervoid

hannahblue commented 6 years ago

Hi @juanignaciosl, what's your estimated timing for this. Let me know if you have any questions/concerns. This is for a demo for a sales opp.

juanignaciosl commented 6 years ago

If there are time constraints it's interesting for us to know. Could you please share them with us? Maybe this issue can be moved to a private repo.

EDIT: in fact, as this is a data request, maybe a public repo is not convenient at all. In a private repo I can attach the resulting file if it's not huge.

juanignaciosl commented 6 years ago

Ok, this is what I've done (for future reference and validation):

  1. There aren't US city geometries at DO, so I got Cincinnati geometry from 2013_us_ua10_550k from Data Library (and copied it with st_astext(the_geom)).
  2. create table public.tmp_cincinnati (the_geom geometry(geometry, 4326)); and inserted Cincinnati geometry: insert into public.tmp_cincinnati (the_geom) values ( st_geomfromtext('MULTIPOLYGON...', 4326) );.
  3. Located the DO table with block group geometries to get a geoid (as a first approximation to the blocks):
gis=# select id, tablename from observatory.obs_table where id like 'us.census%tiger%block_group%';
-[ RECORD 1 ]----------------------------------------------------------
id        | us.census.tiger.shoreline_clip_block_group_2015_3fe72a3f5a
tablename | obs_33cabe86372ca1577bc7d80917f68f8e90f17fe0
  1. Intersected Cincinnati to get its block groups:
    gis=# create table public.tmp_cincinnati_block_groups as (select ot.the_geom, ot.geoid, ot.aland from observatory.obs_33cabe86372ca1577bc7d80917f68f8e90f17fe0 ot, public.tmp_cincinnati c where st_intersects(ot.the_geom, c.the_geom) order by ot.geoid);

screenshot from 2018-08-01 11-58-58

Block groups geoid length is 12:

gis=# select distinct length(geoid) from public.tmp_cincinnati_block_groups ;
 length 
--------
     12
  1. Finally, joined those block groups with the groups to get the data:
gis=# create table public.tmp_cincinnati_mrli as (select * from "us.mastercard".mc_block mcb inner join public.tmp_cincinnati_block_groups cbg on left(mcb.region_id, 12) = cbg.geoid);
  1. Create an export including the block geometries:
gis=# create table public.tmp_cincinnati_mrli_blocks as (select * from tmp_cincinnati_mrli_2 cm inner join observatory.obs_06b3cddafd8b3e04b10fc99b06f82f31691adc1f ob on ob.geoid = cm.region_id);

@hannahblue I'll send you the data in a moment.

juanignaciosl commented 6 years ago

Alternative way, without block groups:

  1. Get just Cincinnati blocks:
gis=# create table public.tmp_cincinnati_blocks as (select og.the_geom, og.geoid, og.aland from observatory.obs_06b3cddafd8b3e04b10fc99b06f82f31691adc1f og, public.tmp_cincinnati c where st_intersects(og.the_geom, c.the_geom));
  1. Intersect with Mastercard data:
gis=# create table public.tmp_cincinnati_blocks_2 as (select cb.*, mcb.* from public.tmp_cincinnati_blocks cb inner join "us.mastercard".mc_block mcb on cb.geoid = mcb.region_id);

This "direct" approach returns slightly less results than the approach through block groups:

gis=# select count(1) from public.tmp_cincinnati_mrli_blocks;
-[ RECORD 1 ]
count | 68704

gis=# select count(1) from public.tmp_cincinnati_blocks_2;
-[ RECORD 1 ]
count | 68242

Some of the blocks with the indirect approach are not at the direct one:

gis=# select geoid from public.tmp_cincinnati_blocks_2 where geoid not in (select geoid from public.tmp_cincinnati_mrli_blocks);
(0 rows)

That's because getting the block groups will also add some blocks that are near Cincinnati but not inside. So, this dataset should be more accurate.

I'll also send this result.

hannahblue commented 6 years ago

Thanks @juanignaciosl! You're right, in the future we'll make these requests in a private repo

juanignaciosl commented 6 years ago

BTW, I just realized that there's some 2012 data, is that a problem?

Please close this if data is ok.

juanignaciosl commented 6 years ago

Asked by @hannahblue through Slack:

Could you please generate the data set for block groups instead? Also, instead of csv you could put it in the PDL since the data set isn't too big.

Ok. Should I add it to any user?

juanignaciosl commented 6 years ago

1.- Query: create table public.tmp_cincinnati_block_groups_mrli as (select cbg.*, mcbg.* from public.tmp_cincinnati_block_groups cbg inner join "us.mastercard".mc_block_group mcbg on cbg.geoid = mcbg.region_id);

2.- Added cincinnati_block_groups_mrli_012012_04_2018 table to "pdl-internal" user account.

3.- Added a date column based on the month:

alter table cincinnati_block_groups_mrli_012012_04_2018 add column "date" date; 
update cincinnati_block_groups_mrli_012012_04_2018 set date = to_date(month, 'MM/DD/YYYY');

As the table contains the geometries it's not small (209MB), but I assumed it'd be more useful that way (and existing Cleveland table also has the geometries).

hannahblue commented 6 years ago

Thanks @juanignaciosl!! Can you please grant access to carto users: stuartlynn, hcurtis, and athompson?

juanignaciosl commented 6 years ago

@hannahblue done!

Note: I've used the same API key for you three. Actual customers should use a different API key each.

hannahblue commented 6 years ago

You rock. Thanks @juanignaciosl