NYCPlanning / labs-factfinder-api

Search API for for labs-nyc-factfinder
Other
6 stars 3 forks source link

Issue/16067 PFF Updates: Update PFF constants to use the 2022 ACS Data #274

Closed horatiorosa closed 6 months ago

horatiorosa commented 7 months ago

Ticket

Issue 16067 Workflow branch strategy

Description

We need update the constants to point to the current version of ACS data in order to run migrations ensure PFF uses the 2022 ACS data using the following directions: Data Updates and Migrations

* or most recent version available when we drill down into the most recent year in DO for ACS and Decennial
note: pr for reference
horatiorosa commented 7 months ago

@pratishta pleasse do this one. I started and had conflicts along with the weird single vs double quote VS Code linting madness.

pratishta commented 7 months ago

@horatiorosa Is there a reason we want to do 2024-02-13 and not 2024-02-15? I realize this update is from yesterday but want to check in anyway

horatiorosa commented 7 months ago

@horatiorosa Is there a reason we want to do 2024-02-13 and not 2024-02-15? I realize this update is from yesterday but want to check in anyway

Ah, Finn mentioned he found found issues with the prior data and was going to re-run the pipelines. We should use the latest date in DO for ACD and Decennial. At this point, looks like main/decennial/2020 and main/acs/2022 were both updated past what's referenced in this issue .Great catch. 🕵🏽‍♀️

pratishta commented 7 months ago

Updated issue/16067-pff with these changes in commit https://github.com/NYCPlanning/labs-factfinder-api/commit/7b182c5ad2422b36b588d6d8b5b7b864ee3522a0

pratishta commented 7 months ago

I'm unable to pull the data locally using 2024-02-15 folders. I can run the migration successfully but I'll have an empty database.

I switched the ACS version constants to 2024-02-13 and was successful with ACS data but not decennial. I'm not sure exactly why though. At initial glance there's a size difference in metadata.json between 2024-02-15 (297.07kb) and 2024-02-13 (418.41kb).

I'm not sure if this is what may be causing migration errors for @horatiorosa but something to look into.

horatiorosa commented 7 months ago

I have empty tables for ACS 2010 and 2020, and Decennial 2010, 2020. Error log as follows:

ERROR:  invalid input syntax for type double precision: "c"
CONTEXT:  COPY tmp, line 453741, column c: "c"
STATEMENT:  COPY  tmp FROM STDIN WITH DELIMITER ',' CSV HEADER;
ERROR:  invalid input syntax for type double precision: "c"
CONTEXT:  COPY tmp, line 418564, column c: "c"
STATEMENT:  COPY  tmp FROM STDIN WITH DELIMITER ',' CSV HEADER;
ERROR:  invalid input syntax for type double precision: "value"
CONTEXT:  COPY tmp, line 15085642, column value: "value"
STATEMENT:  COPY  tmp FROM STDIN WITH DELIMITER ',' CSV HEADER;
ERROR:  invalid input syntax for type double precision: "value"
CONTEXT:  COPY tmp, line 15085642, column value: "value"
STATEMENT:  COPY  tmp FROM STDIN WITH DELIMITER ',' CSV HEADER;
horatiorosa commented 7 months ago

A further note to my above error: When running the migration on the develop branch, I do get data for the ACS 2010, 2021 with 2022 empty and Decennial 2010 and 2020 table both contain data.

pratishta commented 6 months ago

Turns out there was some misformatting in the CSVs that data engineering fixed for us and the new version folders with correct data and formatting is under 2024-02-20. I changed the constants to reflect that in this commit a37cb65