UCL-ARC / python-tooling

Python package template for new research software projects
http://github-pages.arc.ucl.ac.uk/python-tooling/
MIT License
43 stars 2 forks source link

A data science cookiecutter #415

Open sfmig opened 5 months ago

sfmig commented 5 months ago

Is Your Feature Request Related to a Problem? Please Describe

We recently chatted with @samcunliffe and Niko from SWC (cannot tag him) about a cookiecutter for data science / scientific analysis / exploratory Python projects.

The idea would be to have some lighter requirements than for a fully-fledged Python package. And maybe some data science specific additions (like for example functionality for formatting and checking notebooks). This could be useful for many researchers, and maybe a good entry point to getting into good software practices.

Describe the Solution You'd Like

If we find this could be useful, we could have it:

Alternatively, we can just point to a good cookiecutter for this purpose if that already exists.

Describe Alternatives You've Considered

There are some examples of research cookiecutter:

Additional Context

No response

samcunliffe commented 5 months ago

A minimal solution would be to add a couple of trusted/tested data science templates to The Templates Page. I'd propose a section above "Community-specific..."

sfmig commented 5 months ago

this website and the associated cookiecutter may be a nice one to try out

paddyroddy commented 5 months ago

this website and the associated cookiecutter may be a nice one to try out

Does feel a shame to be recommending a rival 🤔

samcunliffe commented 5 months ago

this one seems more recent and quite well documented. It may be a good one to point to if we decide that doing one ourselves is out of scope

I heard good things about ccds. Never actually used it. Should we ask in #datascience on Slack? If we're making a recommendation under the ARC logo, perhaps they should be consulted.

dstansby commented 5 months ago

(like for example functionality for formatting and checking notebooks)

This sounds like something we could add here anyway?

paddyroddy commented 5 months ago

This sounds like something we could add here anyway?

👍 #49

dstansby commented 2 weeks ago

I've been playing around with uv projects recently, and am finding them a really nice halfway house between a single python script, and a full blown package. @sfmig when you have time, would be good to hear if uv projects are the kind of thing you were thinking of here? If so I think we can close this as something we won't duplicate in this repo.

sfmig commented 2 weeks ago

thanks for pointing this out @dstansby!

I had a look and asked around a bit. It does seem like uv's functionalities for creating Python projects could be useful. Particularly its distinction between applications, libraries and packages seems to cover more cases than the standard cookiecutter, that mainly targets people who want to make a fully-fledged Python package. I agree that for a scientist with a few Python scripts, setting up a uv application may be a softer entry to good software development practices than starting off with a cookiecutter.

However, uv doesn't support conda which seems like a big downside for many science and data science projects.

I think if we were to recommend people to use uv to manage Python projects that are more data-sciency, we may be sending them down terrible rabbitholes having to deal with uv and conda environments simultaneously (of which I have no experience). But you have used uv, so do let me know if this is not accurate.

paddyroddy commented 2 weeks ago

@sfmig https://github.com/prefix-dev/pixi