As GFT is put into production, we need to figure out what making sure it functions operationally is going to look like. There main big parts are updating source data and QA, with some other trailing tasks as well
Source Data
Categorizing by source type and how we update them currently
Pluto
dcp_mappluto_wi - self-explanatory
Bytes quarterly updates
These are pretty straightforward. Lion has one issue, its parquet doesn't get created without error at the moment, so I've run manually when it's needed
dcp_boroboundaries_wi
[ ] dcp_lion - needs some investigation as to why gdal errors when creating parquet
"ceqr app" data
Each of these is something else a bit under the hood. Needs some investigation. For now, we can "build" ceqr data
[ ] figure out short and long term plan for ceqr app datasets
This includes
dep_cats_permits
nysdec_state_facility_permits
nysdec_title_v_facility_permits
ArcGIS Feature Service
These can have version programmatically determined, so maybe should be pulled on weekly basis
[ ] set up recurring job
Datasets here are
dcp_cscl_commonplace
dcp_cscl_complex
nysdec_freshwater_wetlands_checkzones
nysdec_freshwater_wetlands
nysdec_tidal_wetlands
nysdec_priority_lakes
nysdec_priority_estuaries
nysdec_priority_streams
nysdec_natural_heritage_communities
nysparks_historicplaces_esri
nysshpo_historic_buildings_points
nysshpo_historic_buildings_polygons
nysshpo_archaeological_buffer_areas
dcp_waterfront_access_map_wpaa
dcp_waterfront_access_map_pow
nysparks_parks_polygons
usnps_parks
Bytes - unknown frequency of update
Both of these found here. They also have the task at the bottom of this issue - they should be renamed because I gave them these horrible unreadable acronyms for some reason
[ ] dcp_wrp_rec
[ ] dcp_wrp_snwa
Socrata
add these to weekly socrata pull if they're not there already
[ ] dpr_forever_wild
[ ] lpc_scenic_landmarks
[ ] lpc_historic_district_areas
[ ] lpc_landmarks
[ ] dpr_parksproperties
[ ] dpr_schoolyard_to_playgrounds
[ ] dcp_edesignation_csv
Script source
[ ] usfws_nyc_wetlands - need to investigate update frequency. This comes from a script because the dataset comes either by state (approaching actual big data) or by watershed. NYC is contained in 4 watersheds, so the script pulls all 4, concatenates them, and archives them
Manual updates
For each of these, we need to figure out both update frequency and if we think that we maybe can pull it ourselves instead
[ ] dcp_air_quality_vent_towers
[ ] dcm_arterial_highways
[ ] panynj_jfk_65db
[ ] panynj_lga_65db
[ ] dcp_beaches
[ ] dob_natural_resource_check_flags
[ ] dcp_pops
QA
This section is a stub for now, but we need to figure out what this looks like moving forward
Versioning
[ ] add logic to dcpy plan to determine version of product from one of the sources (in this case, pluto)
[ ] ensure Data Sources link in app links to a place with version of GFT data is visible (Bytes once we start putting it there)
Cleanup
[ ] rename dcp_wrp_rec and dcp_wrp_snwa to ditch horrible acronyms. Not sure why I did this. Long dataset names are way better than unreadable dataset names
As GFT is put into production, we need to figure out what making sure it functions operationally is going to look like. There main big parts are updating source data and QA, with some other trailing tasks as well
Source Data
Categorizing by source type and how we update them currently
Pluto
Bytes quarterly updates
These are pretty straightforward. Lion has one issue, its parquet doesn't get created without error at the moment, so I've run manually when it's needed
"ceqr app" data
Each of these is something else a bit under the hood. Needs some investigation. For now, we can "build" ceqr data
This includes
ArcGIS Feature Service
These can have version programmatically determined, so maybe should be pulled on weekly basis
Datasets here are
Bytes - unknown frequency of update
Both of these found here. They also have the task at the bottom of this issue - they should be renamed because I gave them these horrible unreadable acronyms for some reason
Socrata
add these to weekly socrata pull if they're not there already
Script source
Manual updates
For each of these, we need to figure out both update frequency and if we think that we maybe can pull it ourselves instead
QA
This section is a stub for now, but we need to figure out what this looks like moving forward
Versioning
Cleanup