UCL-ARC / dirac-swift-api

Repository for the REST API side of the DiRAC-SWIFT project
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

DiRAC-SWIFT API

pre-commit Actions Status Licence

Repository for the REST API side of the DiRAC-SWIFT project

This project is developed in collaboration with the Centre for Advanced Research Computing, University College London.

About

Project Team

Harry Moss (h.moss@ucl.ac.uk)

Peter Andrews-Briscoe (p.andrews-briscoe@ucl.ac.uk)

Research Software Engineering Contact

Centre for Advanced Research Computing, University College London (arc-collab@ucl.ac.uk)

Getting Started

Prerequisites

Installation

python -m venv env
source ./env/bin/activate
pip install "./[dev,test]"

Running the API

Defining required settings

Some settings variables are required by src/api/config.py, and can be set via a .env file as shown in the .env.example file. To set these when running the application, create a .env file in the top level directory of this repository based on .env.example, but providing your own values where required.

Settings variables can also be set directly from environment variables when running the application if a .env file is not found.

Running locally

After installing the package, from the root directory (containing this README), run in development mode with

python src/api/main.py

which will launch a uvicorn server on 127.0.0.1/localhost on the default port 8000.

Alternatively, in the same directory the app can be equivalently launched with

uvicorn api.main:app --reload

By default, the API will be served on localhost:8000, with OpenAPI documentation available at localhost:8000/docs

Deploying the API

When deploying the API for use in production, it's recommended to use Gunicorn to serve the FastAPI application and act as a process manager. Gunicorn can start one or more uvicorn worker processes, listening on the port indicated on startup. Request and response handling is taken care of by individual workers.

Gunicorn will restart failing workers, but care should be taken to deal with cases where the Gunicorn process itself is killed. It's important to note that Gunicorn does not provide load balancing capability, but relies on the operating system to perform that role.

Gunicorn documentation recommends (2 x $num_cores) + 1 workers, although depending on your deployment environment this may not be suitable.

As an example, to start this application under Gunicorn on a localhost port with your choice of workers:

gunicorn src.api.main:app --workers ${n_workers} --worker-class uvicorn.workers.UvicornWorker --bind localhost:${port}

Using the API

The API is heavily coupled with the SWIFTsimIO library and performs server-side manipulation of objects defined in the library. As well as being a dependency of this software, SWIFTsimIO was thought to be a typical client of the API.

A typical workflow is outlined below from a user's perspective. API endpoints are protected and require a valid JWT token to access. Token access depends on successful authentication with the VirgoDB database, which users will need a valid account for before using the API.

Tokens are retrieved from the /tokens endpoint when a valid VirgoDB username/password combination is provided. Tokens can then be added to the header of subsequent requests to successfully access the endpoints.

API endpoints are available for the retrieval of SWIFTUnits, SWIFTMetadata, masked and unmasked Particle Datasets (as numpy arrays). See the API documentation at the /docs endpoint for a full description of available endpoints.

A typical user workflow looks like

Getting authenticated

The API can be accessed after JWT authentication. After submitting your username and password, a token will be generated, which will be attached to a header in all your requests. The token will have a lifespan of an hour, set in SWIFTAuthenticator.generate_token, after which you'll need to generate a new token by signing in again.

Tokens should be added to the request headers as (Python 3 example):

{"Authorization": f"Bearer {token}"}

Sending requests

Authentication aside, most of the useful routes use HTTP POST requests to retrieve objects of interest. Data should be sent as a dictionary, with the documentation detailing which fields are required in each case. The settings dictionary shown in the example docs can be omitted.

For example, the /swiftdata/masked_dataset endpoint requires

Which should be provided as a dictionary:

payload = {
  "data_spec": {
    "alias": "string",
    "filename": "string",
    "field": "string",
    "mask_array_json": "string",
    "mask_data_type": "string",
    "mask_size": 0,
    "columns": 0
  }
}

This should be included in requests as the json parameter, for example

requests.post(url, json=payload)

which sets the Content-Type header to application/json.

Sessions

It's recommended to use a single Session (or similar) when making multiple API calls.

Sessions allow for the use of a single TCP connection when sending multiple requests and allow headers to persist between requests. In this case, attaching the token to the header does not need to be performed during every request (unless the token has expired).

For developers

Running Tests

Tests can be run either via tox or directly via pytest from the top level directory of the repository

tox run

or

python -m pytest -ra . --cov=src/api

either of which will run all tests and generate a coverage report.

API documentation

Automatic documentation is produced when starting the API on the /docs endpoint. These detail all available routes, provide the ability to interactively call them and give example input.

Contributing

To contribute to the project as a developer, use the following as a guide. These are based on ARC Collaborations group practices and code review documentation.

Python standards we follow

To make explicit some of the potentially implicit:

General GitHub workflow

The main branch is for ready-to-deploy release quality code

Reviewing code

The Turing Way provides an overview of best practices - it comes as recommended reading and includes some possible workflows for code review - great if you're unsure what you're typically looking for during a code review.

Project Roadmap