MAPC / datacommon

MAPC's Data Portal
https://datacommon.mapc.org/
3 stars 0 forks source link

Missing datasets #168

Open arouault opened 6 years ago

arouault commented 6 years ago

The following datasets are currently missing from the site. @mzagaja what are the steps to resolve this? Is this a susan issue? should we load or delete record?

I think we ought to remove the option to click on data from the interface if it is not loaded.

patterns: seems to be commonly missing under labeled 'blocks' and 'block groups' though there are some that are 'tracts.' Mostly ACS. Building Permits.

not an exhaustive list but here are some of the missing tables:

Data for towns, many transportation data and MSA data 'loads' but no data in table: https://staging.datacommon.mapc.org/browser/datasets/82 https://staging.datacommon.mapc.org/browser/datasets/83 https://staging.datacommon.mapc.org/browser/datasets/311

mzagaja commented 6 years ago

Spot checking a few from the top list suggests most of these are permission errors. They need to all be cataloged/recorded and their table names sent to Susan so she can individually update the permissions on each dataset to make them public. She needs to add a line to the scripts for loading those and then they won't have permission errors going forward. She probably doesn't know about these yet because from her login it looks like the dataset is loaded already. We might be able to run a script to fix them in a batch as a one off but the individual scripts still need to be updated to make sure it works ok after future updates.

The bottom ones appear to be empty because the datasets have not yet been loaded. She might already know about those but worth double checking.

mzagaja commented 6 years ago

Ok I was logged in with the wrong account so confirmed permissions are ok on the top ones after all. I think it might be because it is trying to sort by a non-existent "years" column. That's a fix I can take a look at.

mzagaja commented 6 years ago

Susan is fixing the ones that weren't loaded yet by marking them as inactive in the Data Browser table. The ones that are loaded but not loading I'm still investigating but the years hypothesis was also incorrect. I think PRQL might be timing out on them. I'm getting the following errors in the nginx log:

018/08/28 17:33:44 [error] 22887#22887: *52814 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.b25002_b25003_hu_occupancy_by_tenure_race_acs_bg%20order%20by%20acs_year%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.b25002_b25003_hu_occupancy_by_tenure_race_acs_bg%20order%20by%20acs_year%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/datasets/130"
2018/08/28 17:41:30 [error] 22887#22887: *52823 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.census2010_p12a_whi_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.census2010_p12a_whi_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/datasets/270"
2018/08/28 17:43:24 [error] 22887#22887: *52830 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.b19013_b19113_b19202_mhi_fam_acs_bg%20order%20by%20acs_year%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.b19013_b19113_b19202_mhi_fam_acs_bg%20order%20by%20acs_year%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/datasets/114"
2018/08/28 17:47:26 [error] 22887#22887: *52836 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.demo_general_demographics_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.demo_general_demographics_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org"
2018/08/28 17:47:42 [error] 22887#22887: *52838 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.census2010_p12a_whi_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.census2010_p12a_whi_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/datasets/270"
2018/08/28 17:47:48 [error] 22887#22887: *52838 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.census2010_p12b_black_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.census2010_p12b_black_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/datasets/274"
2018/08/28 17:48:06 [error] 22887#22887: *52846 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.demo_general_demographics_b%20order%20by%20years%20ASC%3B&token=16a2637ee33572e46f5609a578b035dc%27 HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.demo_general_demographics_b%20order%20by%20years%20ASC%3B&token=16a2637ee33572e46f5609a578b035dc%27", host: "prql.mapc.org"
2018/08/28 17:48:44 [error] 22887#22887: *52838 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.census2010_p20_hh_with_kids_by_hhtype_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.census2010_p20_hh_with_kids_by_hhtype_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/Housing/Household%20Demographics"
2018/08/28 17:48:46 [error] 22887#22887: *52852 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.demo_general_demographics_b%20order%20by%20years%20ASC%3B&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.demo_general_demographics_b%20order%20by%20years%20ASC%3B&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org"
2018/08/28 17:49:21 [error] 22887#22887: *52838 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.census2010_p12d_asian_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.census2010_p12d_asian_race_by_age_gender_b%20order%20by%20years%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/Demographics/Race%20(Individual%20Race)"
2018/08/28 17:50:29 [error] 22887#22887: *52858 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.trans_mavc_public_summary_bg%20order%20by%20quarter%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.trans_mavc_public_summary_bg%20order%20by%20quarter%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/Transportation/Massachusetts%20Vehicle%20Census%20(2009-14)"
2018/08/28 17:50:43 [error] 22887#22887: *52859 upstream prematurely closed connection while reading response header from upstream, client: 104.207.192.42, server: prql.mapc.org, request: "GET /?query=select%20*%20from%20tabular.trans_mavc_public_summary_ct%20order%20by%20quarter%20ASC;&token=16a2637ee33572e46f5609a578b035dc HTTP/1.1", upstream: "http://127.0.0.1:1999/?query=select%20*%20from%20tabular.trans_mavc_public_summary_ct%20order%20by%20quarter%20ASC;&token=16a2637ee33572e46f5609a578b035dc", host: "prql.mapc.org", referrer: "https://staging.datacommon.mapc.org/browser/Transportation/Massachusetts%20Vehicle%20Census%20(2009-14)"

I think I need some help on @ericyoungberg on that.

mzagaja commented 6 years ago

We discovered it is an issue in PRQL that @ericyoungberg is going to fix (and can do so pretty quickly).

select * from tabular.demo_general_demographics_b order by years ASC LIMIT 32768; is the request that doesn't work.

mzagaja commented 6 years ago

Further research reveals that many of these large datasets are causing issues via Eric's PRQL solution. Eric is going to slate some time (though not until next week) to try and surmount them. However research also reveals that these datasets are typically much larger than what CartoDB would have supported (over 250 columns). So using Carto would not have worked on these either. A couple other options might be:

  1. We can constrain our server queries to only pull down a single year to segment the dataset.
  2. We can also offer options to constraint things via geographies from the front end.
  3. We might have to side load large datasets as .zip files for download.
TimReardon commented 6 years ago

Received an error when attempting to retrieve income by race and tract. (https://datacommon.mapc.org/browser/datasets/139) Is this the same error @mzagaja?

screen shot 2018-09-14 at 4 34 18 pm

mzagaja commented 5 years ago

@TimReardon Yeah I can confirm that it's timing out on the large dataset. We need to look into pagination.