The xd project includes a text format for crossword puzzles and a pipeline for downloading, parsing, analyzing puzzles, and producing the website and released data at xd.saul.pw.
Checkout the gxd repo (private; join #crosswords on the Discord to discuss getting access).
make setup
Download new puzzles from known sources, convert to .xd, shelve, and commit to gxd repo.
make import
Raw puz/etc files saved to .zip in /tmp, and .xd files saved to gxd
directory.
Analyze puzzles
make analyze
Output in pub
directory.
Build website
make website
Output in wwwroot
directory.
Generate gxd.sqlite
database (400MB)
make gxd.sqlite
Find similar grids (takes ~12 hours)
make gridmatches
Similarity scores saved to gridmatches
table in gxd.sqlite.