drivendataorg / cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
https://cookiecutter-data-science.drivendata.org/
MIT License
8.23k stars 2.45k forks source link

V2 Modernize boilerplate #354

Closed jayqi closed 5 months ago

jayqi commented 6 months ago

Here is a stab at modernizing the boilerplate.

Open to opinions here. I also considered just getting rid of all of the CLI example modules.

jayqi commented 5 months ago

A little on the fence with a slight preference for dataset.py, features.py and plot.py over the ones that start some_. Removes the rename step and people can customize as desired if they want but the default doesn't require it.

The reason I like the some_ prefix is that it's a sign post that the name is not special or expected to be final and that people should pick a real name for it.

I don't really like having a default generic name. If someone were really going to just have a single dataset module, then having data.dataset, features.features, etc. is weird and we should just get rid of the directory layers.

render[bot] commented 5 months ago

Your Render PR Server URL is https://cookiecutter-data-science-pr-354.onrender.com.

Follow its progress at https://dashboard.render.com/static/srv-coaurquv3ddc73edhebg.

jayqi commented 5 months ago

Well okay. Since we've got some real stuff in the boilerplate now with imports like

from {{ cookiecutter.module_name }}.config import FIGURES_DIR, PROCESSED_DATA_DIR

this breaks linting and formatting.

Options:

  1. Use relative imports to avoid the template syntax. I don't like relative imports (and PEP 8 recommends absolute imports too) so I don't like this.
  2. Move linting into tests so we lint the actual rendered projects. This would work for linting, but wouldn't help with formatting. This is my leaning.
  3. Get rid of the boilerplate so we don't have to deal with this.
pjbull commented 5 months ago

Agreed about #2 for linting.

jayqi commented 5 months ago

@pjbull Should be good to go! I opened these two issues to capture some of the items we're punting on: