NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
24 stars 1 forks source link

Tweak `dcpy` folder structure #1153

Open fvankrieken opened 2 months ago

fvankrieken commented 2 months ago

We've discussed that it would be nice to tweak dcpy structure just a bit to make our repo a tad cleaner.

There are two main accepted structures for the layout of a package - src layout and flat layout. Right now, we're effectively doing a flat layout in the root folder of our repo. This isn't quite ideal - mainly, we don't like that pyproject.toml is serving many different purposes right now, and it'd be nice to have all the packaging/dependency information for dcpy within its own folder.

We briefly discussed having something like

dcpy
- pyproject.toml
- src
  - models
  - ...
- test

However, this has one main issue: having dcpy as both a top-level folder and the name of the package messes up pythons import search order - local folder takes priority, and now dcpy doesn't have a models submodule based solely on the folder structure

To solve this, we could rename the top-level folder - something sort of inelegant like dcpy_pkg would work. Or we could also go more the direction that the setuptools doc suggests, that you have one top-level "packages" folder that has packages within (either in src or flat layout). I still honestly don't quite love this - I sort of like this simplicity of one folder per package, with everything needed inside (as opposed to multiple packages with their own source folders, test folders, and all sharing one pyproject.toml).

damonmcc commented 2 days ago

noting that this came up in sprint retro today, mainly the desire to review the current/intended folder structure and CLI targets in dcpy