NYCPlanning / ceqr-app-data-archive

(DEPRECATED)data pipelines for CEQR app, managed by data engineering
https://github.com/NYCPlanning/ceqr-app-data
1 stars 1 forks source link

CEQR Air Quality: Major or large emission source #13

Closed AmandaDoyle closed 5 years ago

AmandaDoyle commented 5 years ago

‘Major or large emission source’ Inputs: Title V State Permits Facility State Permits

Steps:

Odor-producing facilities are included in these data

AmandaDoyle commented 5 years ago

Reopening so that we can QAQC dataset:

baolingz commented 5 years ago

Comparison between Carey's file and the output generated by open data

  1. Carey's shapefiles have been joined with PLUTO on bbl, which can be incorporated into our workflow as well. Only the facility address is provided.
  2. There are 174 addresses in Carey's shapefiles that could not be found in the output we generated based on DEC open data (title v and state). Some of them are caused by different ways of spelling in street names. such as 1 Avenue and first avenue. Some of them are due to failed geocoding, which is in the result of no address value in our output. There are also many records simply do not exist in the source data we used.
  3. There 186 addresses only existing in our output. https://nbviewer.jupyter.org/github/NYCPlanning/ceqr-app-data/blob/dec_notebook/notebook/DEC_Carey_comparison.ipynb

Comparison between DECinfo locator and the output generated by open data

  1. There are 167 records existing in both datasets.
  2. There are 79 records only existing in the output we generated based on DEC open data.
  3. There are 76 records only existing in the DECinfo Locator
  4. Besides, the data from DECinfo Locator should be the same as the tables listed on the DEC website(title v and state). https://github.com/NYCPlanning/ceqr-app-data/blob/dec_notebook/notebook/DEC_Website_comparison.ipynb
baolingz commented 5 years ago
baolingz commented 5 years ago

After removing permits that are outside NYC, missing zipcode and having the unclear address or having addresses out of range, there are 194 records got gecoded.