birneamstiel / Track-The-Inspector

Visualize the position of recent ticket inspector reports in Berlin.
https://mtin.de/inspector
6 stars 1 forks source link

Improve fuzzy matching by extending static data #2

Open birneamstiel opened 5 years ago

birneamstiel commented 5 years ago

Description:

Right now data/lines.json contains station names following this sample S+U Alexanderplatz Bhf (Berlin). Messages contain usually significantly shorter station names (e.g alexanderplatz) which increases the Levenshtein distance and decreases the accuracy.

Proposal:

A quick fix would be adding shortened station names to the lines.json file. Removing the U/S prefix and (Berlin) suffix would decrease the Levenshtein distance significantly.

"U9": [
        "S+U Rathaus Steglitz (Berlin) [U9]",
        "Rathaus Steglitz",
        "U Walther-Schreiber-Platz (Berlin)",
        "Walther-Schreiber-Platz",
        ...
    ]
derhuerst commented 4 years ago

I built vbb-short-station-name (which shortens common parts like (Berlin) and -Platz), tokenize-vbb-station-name (which expands and normalises these parts) and vbb-stations-autocomplete (which provides a fuzzy search over all VBB stations).

While the implementation is in JavaScript, we could