CityOfNewYork / nyc-planimetrics

New York City Planimetrics Data
87 stars 24 forks source link

Decide how to truncate column names in shapefiles #15

Closed mattyschell closed 8 months ago

mattyschell commented 9 months ago

Planimetrics data must be provided to NYC Open Data as shapefiles. The maximum column name length in shapefiles is 10 characters.

In the past we created custom column names instead of letting the software auto-truncate. For example

FEATURE_CODE -> FEAT_CODE SUB_FEATURE_CODE -> SUB_CODE.

Nice. But more work.

Probably we will want to create some sort of custom mapping. Here's a list of columns from prior releases that exceed the length limit.

Column Name Length
CONSTRUCTION_YEAR 17
SE_ANNO_CAD_DATA 16
SUB_FEATURE_CODE 16
LAST_STATUS_TYPE 16
LAST_STATUS_DATE 16
GROUND_ELEVATION 16
LAST_MODIFY_DATE 16
LAST_MODIFY_BY 14
FEATURE_CODE 12
GEOM_SOURCE 11
HEIGHT_ROOF 11
DOB_JOB_NUM 11
DESCRIPTION 11
STREET_NAME 11
BLOCKFACEID 11
mattyschell commented 9 months ago

We are proposing outputting shapefiles with the same column names as the last round of planimetrics. So we'll need to look through the .shps on NYC Open Data and create the crosswalk.

FEATURE_CODE -> FEAT_CODE SUB_FEATURE_CODE -> SUB_CODE ...etc

And we would like the metadata here to somehow indicate that you will see one column name if you have a shp and a different one if you have a gdb. Sounds tricky!

mattyschell commented 9 months ago

Pfitts says:

• At least one field you can drop/disregard - SE_ANNO_CAD_DATA is a legacy attribute added by our former SDE technology, now rebranded to an “enterprise geodatabase”. So this can be dropped from the data delivery all together with no impact.

Thanks, my mistake. I wrote out the SQL in an Enterprise Geodatabase schema and didn't pay any attention to what I copy-pasted out.

mattyschell commented 9 months ago

We'll post a crosswalk from full column names to truncated shapefile column names here by the end of the week (we hope).

We'll review planimetrics 2014 shapefile downloads from NYC Open Data and use the same conventions.

mrahman-doitt commented 9 months ago

Using our current naming convention on NYC Open Data, I've managed to identify custom column names for the Shapefiles.

I have also included/proposed two column names for the new feature class in Planimetric data.

Column Name Length Column Name in Shapefile Length in Shapefile Comments
BLOCKFACEID 11 BLOCKF_ID 9  
CONSTRUCTION_YEAR 17 CNSTRCT_YR 10  
DESCRIPTION 11 DESCRIPTIO 10  
DOB_JOB_NUM 11   0 We do not publish this column
FEATURE_CODE 12 FEAT_CODE 9  
GEOM_SOURCE 11 GEOMSOURCE 10  
GROUND_ELEVATION 16 GROUNDELEV 10  
HEIGHT_ROOF 11 HEIGHTROOF 10  
LAST_MODIFY_BY 14   0 We do not publish this column
LAST_MODIFY_DATE 16 LSTMODDATE 10  
LAST_STATUS_DATE 16   0 We do not publish this column
LAST_STATUS_TYPE 16 LSTSTATYPE 10  
SE_ANNO_CAD_DATA 16   0 We do not publish this column
STREET_NAME 11 STREET_NAM 10  
SUB_FEATURE_CODE 16 SUB_CODE 8  
BASE_ELEVATION 14 BASE_ELEV 9 New for  WATER_TANK feature class
TOP_ELEVATION 13 TOP_ELEV 8 New for  WATER_TANK feature class
jwileman1 commented 8 months ago

I added some verbiage in General Attributes section and included truncated names throughout ... hopefully to address this in full.

by the way, I removed attributes entirely if NYC does not publish it.

mattyschell commented 8 months ago

We are truncating and replacing open data shapefiles wherever possible. We will match 2022 shapefile column names to conform to the 2014 columns on open data.