Logiqx / wsw-results

Weymouth Speed Week Results
https://logiqx.github.io/wsw-results/
GNU General Public License v3.0
0 stars 0 forks source link

wsw-results

Copyright 2022 Michael George (AKA Logiqx).

This file is part of wsw-results and is distributed under the terms of the GNU General Public License.

wsw-results is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

wsw-results is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with wsw-results. If not, see https://www.gnu.org/licenses/.

Overview

This project was originallly created to produce the daily and weekly results for Weymouth Speed Week 2022.

It supersedes the Excel macros (SSERPANT) that were used to generate the WSW results between 2010 and 2021.

Features

This project has become rather more elaborate than originally planned and now has the following features:

Note: The years 2000 and 2001 had 200m, 300m and 700m courses which are reported separately and will not impact 500m records.

Fuzzy Name Matching

Competitors often register with slight variations in how their name is written. For example:

This project implements a bespoke "fuzzy matching" algorithm to spot name variations and highlight them during report generation.

Editing the relevant entrants to make the names consistent across all years ensures that competitor profiles are as accurate as possible.

The bespoke "fuzzy matching" algorithm uses a combination of a nickname lookup, Soundex and Levenshtein distance.

The algorithm itself will not be explained in this document but the code can be found in fuzzy.ipynb and name.ipynb.

There are 3 main reasons for building this "fuzzy matching" functionality as actual code:

  1. The initial process of getting competitor names consistent between 1998 to 2021 was a lot less tedious and less prone to error.
  2. The automated testing which compares newly generated results with past results can recognise the names in the original results.
  3. All future competitors with "fuzzy matches" to names in previous years can be highlighted automatically by the reporting process.

Unit Testing

The project includes pretty extensive unit testing within all of the Python modules:

Thorough unit testing ensures that the software can be trusted for all past results and will accurately generate results in the future.

Technical Docs

Full technical documentation is being maintained in a separate document within the GitHub repository.