Closed danrademacher closed 6 years ago
Immediate data issue is corrected: I went into the CARTO table and updated e.g. "BRONX" to become "Bronx" This fixes the immediately-obvious issue in the UI.
Second step would be to examine the ETL script and determine whether the capitalization is standardized. How this could be related to recent changes regrading #73 I don't know, as the selfsame validation and correction code is used; all that did is expand the set of records to be examined. Perhaps the borough names at the API were never standardized as they had previously followed a standard; we have noted data quality issues previously.
Finally back onto this after a few days on other projects. The likely cause of the borough variations is as follows:
create_sql_insert()
inserts the borough names as-given from SODA. There are two followups which would correct borough names: update_borough()
and normalizeBoroughSpellings()
However, both of these contain a filter by LATEST_DATE
which is the highest date found in the table before the ETL process was started. As such, loading of a "backdated" record from an earlier date, would be skipped by both of these updates.
This work will be continued in https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/issues/6 which is specifically for the ETL process.
Looks like some normalization step got broken recently, or the data input has changed,