century-arcade / xd

a futureproof crossword corpus toolset
MIT License
221 stars 26 forks source link


The xd project includes a text format for crossword puzzles and a pipeline for downloading, parsing, analyzing puzzles, and producing the website and released data at xd.saul.pw.


Running the pipeline

  1. Checkout the gxd repo (private; join #crosswords on the Discord to discuss getting access).

    make setup

  2. Download new puzzles from known sources, convert to .xd, shelve, and commit to gxd repo.

    make import

Raw puz/etc files saved to .zip in /tmp, and .xd files saved to gxd directory.

  1. Analyze puzzles

    make analyze

Output in pub directory.

  1. Build website

    make website

Output in wwwroot directory.

  1. Generate gxd.sqlite database (400MB)

    make gxd.sqlite

  2. Find similar grids (takes ~12 hours)

    make gridmatches

Similarity scores saved to gridmatches table in gxd.sqlite.