Trilarion / opensourcegames

Technical infos of open source games.
https://trilarion.github.io/opensourcegames/
Creative Commons Zero v1.0 Universal
684 stars 85 forks source link

Check for duplicate entries in the database #181

Closed Trilarion closed 4 years ago

Trilarion commented 5 years ago

With nearly 600 entries and growing, the question arises at which point we lose the overview and introduce duplicates (possible with different names). My guess is that one could identify them by comparing home URLs or code repository URLs. Maintenance could include a check for this.

Trilarion commented 4 years ago

A check based on similarities of names and home URLs unfortunately failed and gave way too many false positives because many entries are quite similar (Angband and Zangband, 2048 and n2048, bombic and bombic2, brikx and froggix, castle of the winds and castle of the winds in elm, conquest and conquests, devilution and devilutionx, foobilliard and foobilliard++).

Instead, I will now only check for exact duplicates of canonical game names and links that are appearing in multiple entries (which is okay in principle).

Also checking this is a O(n²) operation right now, which is a bit slow. In principle I don't really need to check for duplicates of canonical game names (after all they are used as file names and I would realize a file name clash immediately).

On second though, I just do the links check and only when checking external links (it's too much work otherwise).

Trilarion commented 4 years ago

I made a TODO in the code about checking external links.