Logiqx / sse-results

Speed Sailing Event Results
https://logiqx.github.io/sse-results/
GNU General Public License v3.0
0 stars 0 forks source link

sse-results

Copyright 2022 Michael George (AKA Logiqx).

This file is part of sse-results and is distributed under the terms of the GNU General Public License.

sse-results is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

sse-results is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with sse-results. If not, see https://www.gnu.org/licenses/.

Overview

This project was originally created to produce the daily and weekly results for Weymouth Speed Week 2022.

Generic features have subsequently been separated out to create this project; Speed Sailing Event Results / sse-results.

Features

This project has become rather more elaborate than originally planned and now has the following features:

Note: Different course lengths can be reported separately and will not impact 500m records.

Fuzzy Name Matching

Competitors often register with slight variations in how their name is written. For example:

This project implements a bespoke "fuzzy matching" algorithm to spot name variations and highlight them during report generation.

Editing the relevant entrants to make the names consistent across all years ensures that competitor profiles are as accurate as possible.

The bespoke "fuzzy matching" algorithm uses a combination of a nickname lookup, Soundex and Levenshtein distance.

The algorithm itself will not be explained in this document but the code can be found in fuzzy.ipynb and name.ipynb.

There are 3 main reasons for building this "fuzzy matching" functionality as actual code:

  1. The initial process of getting competitor names consistent across all years is a lot less tedious and less prone to error.
  2. The automated testing which compares newly generated WSW results with past results can recognise names in the original results.
  3. All future competitors with "fuzzy matches" to names in previous years can be highlighted automatically by the reporting process.

Unit Testing

The project includes pretty extensive unit testing within all of the Python modules:

Thorough unit testing ensures that the software can be trusted for all past results and will accurately generate results in the future.

Technical Docs

Full technical documentation is being maintained in a separate document within the GitHub repository.