dolphin-acoustics-vip / database-management-system

A database manager webapp in development for the University of St Andrews Dolphin Acoustics Vertically Integrated Project.
2 stars 0 forks source link

⚠️ Warning: this project is still in a developmental stage. Some sections of the code and documentation may be incomplete.

Dolphin Acoustics VIP Database Management System

The Database Management System (DBMS) is a project that aims to streamline the data pipeline of the Dolphin Acoustics Vertically Integrated Project at the University of St Andrews.

The ensuing documentation details the installation and operation of the code.

Familiarity with the data pipeline and implementation strategy found in the Project's GitHub Wiki are required. General computer science competency, as well as more specific familiarity with the requirements are also prerequesite.

Requirements

This section lists software and hardware requirements for the operation of the Program.

The Program has been developed on and for, a Linux based system running Debian 12. Further development, testing and production are to be done on the same (or similar) type system.

Project description

The Program was developed to store data with its metadata in a homogenised system. The following are outlined a list of some features (a more comprehensive list is to be found on the GitHub Wiki):

Files and their metadata are stored in two separate locations: the Filespace and Database respectively. The Web App was then created to maintain data integrity between the two, and provide an interface for users.

alt text

High level data flow diagram of the DBMS

Installation

⚠️ Warning make sure you have read and understood the requirements before continuing.

The repository of which this document is a part includes all code pertaining to the Program. The following steps are required whether the Program is in development or production, and must be followed closely to ensure a seamless installation:

  1. Create a virtual environment
  2. Create and link the Database
  3. Create and link the Filespace
  4. Start the Web App

Creating a Virtual Development Environment

The Program was developed using an array of libraries installed using the package installer for Python (pip). It is recommended whether developing or in production to install all requirements in requirements.txt in a virtual environment.

Commands to create and run the virtual environment

python3 -m venv virtualenv
source virtualenv/bin/activate
pip3 install -r requirements.txt

Command to create the requirements text file

If you make changes to the library and want to generate a new requirements.txt file, open the virtual environment and run the following command:

pip3 freeze > requirements.txt

Creating and Linking the Database

Note: the following section assumes the existence of a local (for development and testing) or external (for production) MariaDB instance.

To create the database, open a MariaDB shell and run the DDL script create_database.sql. This will create all empty tables apart from a single administrator user which can be used to log into the Web App initially.

The database connection is made in database_handler.py, however configuration for the database is made in config.py. Here exist three different profiles. In main.py, the profile can be changed by changing the argument passed in the mainline:

if __name__ == '__main__':
    # For development configuration
    app = create_app('config.DevelopmentConfig')
    # For testing configuration
    app = create_app('config.TestingConfig')
    # For production configuration
    app = create_app('config.ProductionConfig')
    if app is None:
        logger.fatal('Exiting program...')
        exit(1)
    logger.info('Starting application')
    app.run()

Each configuration required different global environment variables to be set to ensure an operational database connection.

Configuring the Development Database

Set the following global environment variables:

Configuring the Testing Database

Set the following global environment variables:

Configuring the Production Database

Set the following global environment variables:

Creating and Linking the Filespace

The File Space is simply a designated path on the server system. To set this folder, insert the relative or absolute path into file_space_path.txt in the program root.

Note that by default the path in file_space_path.txt is relative, however should you enter an absolute path (for example one beginning with C:), this will automatically be recognised.

No folders or files need to be inserted in the Filespace. However the Filespace folder itself must exist in the system before operating the Web App.

Starting the Web App

To start the Web App on the server, activate the Virtual Environment and run main.py from the root directory. No parameters are required. If all steps prior were completed satisfactorily the Flask application will run successfully.

Note: if the Flask start-up message includes Debug mode: on, this likely means the program is being run in development or testing configuration.

Maintaining and Operating the Web App

The Web App brings together the Meta Base and the File Space into a single user interface. The Web App utilises the Flask library.

The following folders exist in the Web App's root directory (note that a module refers to a compartamentalised section of code pertaining to a specific functionality such as encounter, recording or selection):

The following files exist outside the aforementioned folders:

Templates and Static Files

Templates are layouts that arrange content on a webpage, usually written in HTML. Found in the templates folder, templates are structured into modular sub-categories for set functions.

The template templates/partials folder has a number of modular templates which are used multiple times in other HTML code.

Styling (or CSS) and Javascript files are stored in the static/css and static/js folders respectively. The files in these folders are loaded in templates by the header.html module which is shown on each page.

Routes

Routes are a server-side URL schema which describe interfaces through which a client can interact with a web app. Routes follow a Hypertext Transfer Protocol (HTTP) through which requests such as GET, POST and DELETE can be made.

Any request sent to the server that matches a defined URL schema is handed to the associated method defined in routes. All routes are brought together in main.py.

Routes are intuitively modularised by function (a general rule of thumb is that a particular section of the user interface will interact with routes from just a single module).

For example: viewing, adding, updating and removing recordings is done through routes in routes_recording.py.

Object Relational Mapping

When interacting with the Meta Base, an object relational mapping (ORM) is implemented. This allows all the database relations to be lazily and effortlessly loaded in a familiar object-oriented structure in Python.

The classes for each relation are written in models.py using the Flask-SQLAlchemy. This library offers seamless integration with the Flask library, that used to create the Web App. The structure of each model closely matches the Meta Base schema.

Version History

In the database creation script, you will notice a command used on all but the file and user tables: WITH VERSION HISTORY. This is a feature of MariaDB meaning a snapshot of the database can be viewed at any point in time since its creation.

This requires complex queries that cannot be completed in the SQLAlchemy ORM model. Therefore, database_handler.py contains methods used throughout the program to query the database in a snapshot state. This is called Archive Mode and is implemented through storing snapshot_date in the session cookies (more can be read on Archive Mode in the GitHub Wiki).

Session handling

As data must be synchronised between the File Space and Meta Base, atomicity is crucial. An atomic database transaction is one where either all required operations occur or none at all.

To implement atomicity in the Web App, all database operations are bundled into sessions. If an error is produced in interacting with the File Space, any metadata changes pertaining to the request are rolled back.

To prevent orphaned files from appearing in the database, all database changes are flushed (flush()) before any files are moved. This way, any issues appearing due to the flush are caught and handled (or passed to the user) before creating irreversible file changes.

Data Model

Overview

The data is stored in a MariaDB database using SQLAlchemy. Each table is represented by a class in Python, defined in models.py.

Unique IDs

Most tables use unique IDs as primary keys, generated using MariaDB's UUID() method. This creates a 36-character string in the format XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX. Unique constraints are also defined separately on specific attributes to ensure data integrity.

File Storage

⚠️ Warning: the File Space should rarely be manually modified by the Developer and never by the User. This is because changing file paths would invalidate the file references in the Meta Base.

File Space

The File Space shown in a diagramattic form. Note that the naming conventions are for demonstration purposes only and do not accurately reflect the implementation of the File Space

Files are stored in unique paths, with each file represented by a File class object. All data files stored by the user in the DBMS are placed in the File Space. The File Space should already be initialised above. The following techniques are used for file storage:

Creating a Unique Path

When a file needs to be saved, a new File object is created and referenced as a foreign key in its parent class. The parent class provides the path (directory and filename) to the File object. This ensures that each file is associated with a specific parent object and can be easily retrieved.

The directory and filename are then passed to the File object using insert_path_and_filename(). This method updates the File object with the generated path and filename, ensuring that the file is stored in the correct location.

Changing Paths

When metadata changes, the file location must be updated to reflect the new metadata. The Encounter, Recording, and Selection classes are responsible for updating their file locations using the update_call() method.

This ensures that files are always stored in a location that reflects the current metadata, and that files can be easily retrieved using the updated metadata.

Handling Duplicate Files

If a file already exists in the target path, the existing file is renamed using File.rename_loose_file(). This prefixes a unique ID to the existing file, allowing the file to be moved to the desired location.

Note: This should never happen, as file paths are chosen to be unique alongside the database metadata. However, in the unlikely event that a duplicate file is detected, this mechanism ensures that the file is renamed and can be stored in the desired location. An error is written to the log in this case, indicating that a duplicate file was detected and renamed. This allows for easy identification and resolution of any issues related to duplicate files.

Handling File Deletion

The delete() method prevents foreign key error references upon delete. Upon the deletion request of a master object (by calling delete() in the master), the delete() method in each child is added to the call stack, which ensures all slaves are removed from the database before a master.

The delete() method in Encounter, Recording, and Selection are responsible for removing all their own files before returning to the master.

⚠️ Warning: cascading delete is dangerous, and where it is implemented the user should always be warned before execution.

Testing

Testing is found in the tests folder. Python files exist there within the pytest framework. Tests can be run with python3 -m pytest tests from the root directory.

The test environment is automatically created, and is effectively an instance of the Flask application. This means all setup instructions MUST be followed before running tests.

The following tests are included in the tests folder. Some require files to exist in the tests/resources directory:

Test Contour Stats

test_contour_stats.py tests the calculation of contour statistics. The running of these tests requires a number of resources to exist in tests/resources/contour-stats:

An example of this setup is shown here: alt text

Test Encounter

tests/test_encounter.py tests the Encounter class in models.py.

Test Errors

tests/test_errors.py tests methods and classes in exception_handler.py.

Test Recording

tests/test_recording.py tests the Recording class in models.py.

Test Restricted Routes

tests/test_restricted_routes.py tests whether access is forbidden to the required routes for users without certain access permissions.

Test Utils

tests/test_utils.py tests a number of general utilities classes stored in utils.py.

Certain tests (testing the reading of CSV, TSV and XLSX files into Pandas) require the following files to exist in tests/resources/utils:

This should look like: alt text