cjb / codex-blackboard

Meteor app for coordinating solving for our MIT Mystery Hunt team
GNU Affero General Public License v3.0
25 stars 17 forks source link

Automatically screen-scrape new puzzles #288

Open cscott opened 6 years ago

cscott commented 6 years ago

The most stressful time for a callin operator is when new puzzles are released. Lots of data entry to do, and if you're not quick enough you get sniped by someone who is probably well-intentioned but might inadvertently misspell a puzzle or put it in the wrong round, etc. We've discussed putting a "lock out" on the blackboard that would only let a certain person add puzzles, but this is contrary to our permissive team spirit, and complicates getting help when/where help is actually appreciated.

Since the 'puzzle log' page has been consistent now for about three hunts, it's probably time to add configurable screen-scraping functionality to our blackboard. Probably some combination of URL plus CSS selector for an element, plus selector for the puzzle title (content of element or data- property name).

In the puzzle log it's tr > td { contents are "UNLOCKED" } and then the puzzle name is the a contents in the previous cell (and the href in the previous cell should be the puzzle link).

In the "List of Puzzles" page it's div.puzzle-list-item > afor the link and the title. You'd probably want to disambiguate already-existing puzzles by thea href`, just in case titles were duplicated?

It's easiest if we don't try to scrape the round association, since hunt structure varies quite a bit and we might want to organize puzzles in a way which the constructing team is hesitate to reveal because it would spoil the meta. #221, #277, and #287 would be helpful for letting the screen-scraper concentrate on the puzzle chat and spreadsheet creation, which is most cruicial, and letting us backfill the less-urgent questions about organization of puzzles into rounds and identifying metas.

cscott commented 6 years ago

We'd probably want to look at ETags or other means to ensure that we can poll this page at high frequency w/o overloading HQ's server.

Torgen commented 6 years ago

Can we ask setec if they'd be willing to provide a per-team RSS/atom feed using basic auth? If they agree, it really makes our work simple.

On Jan 16, 2018 1:41 PM, "C. Scott Ananian" notifications@github.com wrote:

We'd probably want to look at ETags or other means to ensure that we can poll this page at high frequency w/o overloading HQ's server.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cjb/codex-blackboard/issues/288#issuecomment-358116287, or mute the thread https://github.com/notifications/unsubscribe-auth/AI8ZKIBbdtF2mFfzCfD4T7WO4YxfWPnvks5tLRdsgaJpZM4Rgd2i .

Torgen commented 6 years ago

If we want to be clever, we'd scrape frequently (every second) right after solving a puzzle, since that's the time we'd be most likely to unlock something, then decay to once per minute gradually or when the puzzle list changes.