datasciencecampus / census21api

A Python wrapper for the England & Wales Census 2021 "Create a Custom Dataset" API
https://datasciencecampus.github.io/census21api/
MIT License
4 stars 0 forks source link

Structuring a Python package #18

Closed daffidwilde closed 1 year ago

daffidwilde commented 1 year ago

Giving this repository some more structure would make using this software much easier for other users.

I would recommend packaging up the software. The standard structure for Python packages is as follows:

my_repo/
├── src/
│   └── my_package/
│       ├── __init__.py
│       ├── first_module.py
│       ├── second_module.py
│       └── subpackage/
│           └── first_subpackage_module.py
├── tests/
│   ├── test_first_module.py
│   ├── test_second_module.py
│   └── subpackage/
│       └── test_first_subpackage_module.py
├── .gitignore
├── LICENSE
├── pyproject.toml
└── README.md

It's sometimes called a "src/package" structure.

The idea is to put all of your modules (python files, not notebooks) into the src/my_package directory. That is now the home for your package. Then your tests go in the tests directory. That keeps everything separate and organised.

You can have other directories for other stuff. Documentation might go in docs, notebooks in nbs, etc.

The last important piece (it's all important really) is the pyproject.toml file, which describes the configuration of the package. As a minimum, you need:

[project]
name = "my_package"
version = "0.0.1"
dependencies = [
    "these",
    "are",
    "my",
    "requirements",
]

No need for a requirements.txt file!

With these changes, anyone can clone the repository and install it in the usual way: cd my_repo; python -m pip install ..

daffidwilde commented 1 year ago

I can see from 30d228e that the structure has been implemented, which is great :)

A few small things:

  1. The package code should live in a directory within src. The name of that directory should match the value of name in pyproject.toml. I recommend something snappy and descriptive. The nomis wrapper is called ukcensusapi or something, for instance. Have a think about a cool name:
    • something descriptive (census21api)
    • something contractible (import census21api as ca)
    • acronyms (caw stands for Census 2021 API Wrapper)
    • portmanteaux (cenpai notice me)
    • whatever 😄
  2. The naming convention for modules in Python is to use snake_case and to be minimally descriptive. So, Interface.py should be interface.py and Census21_CACD_Wrapper.py could just be wrapper.py.
  3. There should be a file called __init__.py in the top level of the package, i.e. where Interface.py is. That file should include all your top-level imports like:
    
    """A Python wrapper for the England & Wales Census 2021 API."""

from .interface import Interface from .wrapper import APIWrapper

all = ["APIWrapper", "Interface"]