Repository for the REST API side of the DiRAC-SWIFT project
This project is developed in collaboration with the Centre for Advanced Research Computing, University College London.
Harry Moss (h.moss@ucl.ac.uk)
Peter Andrews-Briscoe (p.andrews-briscoe@ucl.ac.uk)
Centre for Advanced Research Computing, University College London (arc-collab@ucl.ac.uk)
Clone the repository and cd
into the repository directory
Create a virtual environment
python -m venv env
source ./env/bin/activate
README.md
)pip install "./[dev,test]"
Some settings variables are required by src/api/config.py
, and can be set via a .env
file as shown in the .env.example
file. To set these when running the application, create a .env
file in the top level directory of this repository based on .env.example
, but providing your own values where required.
Settings variables can also be set directly from environment variables when running the application if a .env
file is not found.
After installing the package, from the root directory (containing this README), run in development mode with
python src/api/main.py
which will launch a uvicorn server on 127.0.0.1
/localhost
on the default port 8000
.
Alternatively, in the same directory the app can be equivalently launched with
uvicorn api.main:app --reload
By default, the API will be served on localhost:8000
, with OpenAPI documentation available at localhost:8000/docs
When deploying the API for use in production, it's recommended to use Gunicorn to serve the FastAPI application and act as a process manager. Gunicorn can start one or more uvicorn worker processes, listening on the port indicated on startup. Request and response handling is taken care of by individual workers.
Gunicorn will restart failing workers, but care should be taken to deal with cases where the Gunicorn process itself is killed. It's important to note that Gunicorn does not provide load balancing capability, but relies on the operating system to perform that role.
Gunicorn documentation recommends (2 x $num_cores) + 1
workers, although depending on your deployment environment this may not be suitable.
As an example, to start this application under Gunicorn on a localhost
port with your choice of workers:
gunicorn src.api.main:app --workers ${n_workers} --worker-class uvicorn.workers.UvicornWorker --bind localhost:${port}
The API is heavily coupled with the SWIFTsimIO library and performs server-side manipulation of objects defined in the library. As well as being a dependency of this software, SWIFTsimIO was thought to be a typical client of the API.
A typical workflow is outlined below from a user's perspective. API endpoints are protected and require a valid JWT token to access. Token access depends on successful authentication with the VirgoDB database, which users will need a valid account for before using the API.
Tokens are retrieved from the /tokens
endpoint when a valid VirgoDB username/password combination is provided. Tokens can then be added to the header of subsequent requests to successfully access the endpoints.
API endpoints are available for the retrieval of SWIFTUnits
, SWIFTMetadata
, masked and unmasked Particle Datasets (as numpy arrays). See the API documentation at the /docs
endpoint for a full description of available endpoints.
A typical user workflow looks like
The API can be accessed after JWT authentication. After submitting your username and password, a token will be generated, which will be attached to a header in all your requests. The token will have a lifespan of an hour, set in SWIFTAuthenticator.generate_token
, after which you'll need to generate a new token by signing in again.
Tokens should be added to the request headers as (Python 3 example):
{"Authorization": f"Bearer {token}"}
Authentication aside, most of the useful routes use HTTP POST requests to retrieve objects of interest. Data should be sent as a dictionary, with the documentation detailing which fields are required in each case. The settings
dictionary shown in the example docs can be omitted.
For example, the /swiftdata/masked_dataset
endpoint requires
Which should be provided as a dictionary:
payload = {
"data_spec": {
"alias": "string",
"filename": "string",
"field": "string",
"mask_array_json": "string",
"mask_data_type": "string",
"mask_size": 0,
"columns": 0
}
}
This should be included in requests as the json
parameter, for example
requests.post(url, json=payload)
which sets the Content-Type
header to application/json
.
It's recommended to use a single Session (or similar) when making multiple API calls.
Sessions allow for the use of a single TCP connection when sending multiple requests and allow headers to persist between requests. In this case, attaching the token to the header does not need to be performed during every request (unless the token has expired).
Tests can be run either via tox
or directly via pytest
from the top level directory of the repository
tox run
or
python -m pytest -ra . --cov=src/api
either of which will run all tests and generate a coverage report.
Automatic documentation is produced when starting the API on the /docs
endpoint. These detail all available routes, provide the ability to interactively call them and give example input.
To contribute to the project as a developer, use the following as a guide. These are based on ARC Collaborations group practices and code review documentation.
To make explicit some of the potentially implicit:
>= 3.10
feature-newgui
or adding-scaffold
main
main
The main
branch is for ready-to-deploy release quality code
The Turing Way provides an overview of best practices - it comes as recommended reading and includes some possible workflows for code review - great if you're unsure what you're typically looking for during a code review.