NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
20 stars 0 forks source link

ZTL - fix versioning, align s3 folder layout with other repos #11

Closed fvankrieken closed 1 year ago

fvankrieken commented 1 year ago

ZTL uses %Y/%m/01 as date format for versioning, as written here (if link is broken, changes have been merged and it's one of the only lines in the file). In s3 this creates unnecessary subfolders by month and day. This same version format is used in EDM_DATA for archives which get used in ztl build to generate qaqc outputs. I would propose

Beyond that, maybe worth going to the branch/date format we use for most repos, though maybe that can wait for another issue where we aim to align all data products, because many differ just barely (output/no output, whether latest is in main or one level above it, etc)

athursland commented 1 year ago

Looks like run instructions in the README for ZTL is outdated too - will update in PR for this issue

damonmcc commented 1 year ago

@fvankrieken @athursland

with Ali's PR https://github.com/NYCPlanning/data-engineering/pull/109 merged, do we wanna drop the tasks around changing old files and tables and close this?

fvankrieken commented 1 year ago

I think we should update s3 folders to align with this

damonmcc commented 1 year ago

renamed and moved all folders in edm-publishing/db-zoningtaxlots to have the pattern YYYY-MM-01/output/...