danmermel / cryptario

Cryptic crossword solver
0 stars 0 forks source link

The Adam Gilchrist problem #30

Open glynnbird opened 5 years ago

glynnbird commented 5 years ago

Do we have enough data in our anagrams database?

Irish lad at MCG stumped Aussie legend (4,9)

Solution: ADAM GILCHRIST

glynnbird commented 4 years ago

https://wiki.dbpedia.org/develop/datasets/dbpedia-dataset-2019-08-30-pre-release

^ get a list of entities and add it to our dictionary that powers the anagram engine.

Stats from this exercise

15.2 million articles in the data dump We boiled it down to 8.3 million.

danmermel commented 4 years ago

We could try and figure out the most popular pages from here https://dumps.wikimedia.org/other/pageviews/2020/2020-01/

and only use those

glynnbird commented 4 years ago
CREATE TABLE stats (page VARCHAR(255) PRIMARY KEY,views INTEGER NOT NULL);

INSERT INTO stats (page,views) VALUES ('Taylor Swift', 56) ON CONFLICT (page) DO UPDATE SET views=views+56;
danmermel commented 4 years ago

TO DO

Put pageviews code on windermere run it for yesterday's date automatically put it in a cron that combines the output of the above with combined.txt and de-dupes (sort -u) commits the new file to git and deploys it (which then builds the anagram dictionary again)

We want to save each file with a predictable name so that we can then do another project of analysing files and finding new entrants.

We need to write all this stuff up!