GreenInfo-Network / nyc-crash-mapper-chart-view

Chart view for NYC Crash Mapper that allows for viewing Trends, Comparing, and Ranking of various NYC geographies
http://vis.crashmapper.org
MIT License
2 stars 1 forks source link

Boroughs doubled in Select Boundary menu becuase of text case #74

Closed danrademacher closed 6 years ago

danrademacher commented 6 years ago

Looks like some normalization step got broken recently, or the data input has changed,

image

gregallensworth commented 6 years ago

Immediate data issue is corrected: I went into the CARTO table and updated e.g. "BRONX" to become "Bronx" This fixes the immediately-obvious issue in the UI.

Second step would be to examine the ETL script and determine whether the capitalization is standardized. How this could be related to recent changes regrading #73 I don't know, as the selfsame validation and correction code is used; all that did is expand the set of records to be examined. Perhaps the borough names at the API were never standardized as they had previously followed a standard; we have noted data quality issues previously.

gregallensworth commented 6 years ago

Finally back onto this after a few days on other projects. The likely cause of the borough variations is as follows:

create_sql_insert() inserts the borough names as-given from SODA. There are two followups which would correct borough names: update_borough() and normalizeBoroughSpellings()

However, both of these contain a filter by LATEST_DATE which is the highest date found in the table before the ETL process was started. As such, loading of a "backdated" record from an earlier date, would be skipped by both of these updates.

This work will be continued in https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/issues/6 which is specifically for the ETL process.