TechAndCheck / tech-and-check-alerts

Daily tip sheet for fact checkers
MIT License
13 stars 6 forks source link

Twitter list scrape is unreliable on staging #361

Closed slifty closed 4 years ago

slifty commented 4 years ago

We merged #229 and deployed to staging a few weeks ago; it's taken a few rounds of kicking to get it to run (mostly due to some protections we have on staging around queues needing to manually be enabled).

This week we finally got that sorted, but now we actually have a bug!

It looks like something is causing the spreadsheet scraper to trigger thousands of times on an empty spreadsheet ID.

From the logs, it looks like the first sync works just fine, but then there is a SLEW of undefined triggers.

{"message":"Syncing twitter accounts for \"national\" from Google doc: 1gLkx2LK3yhS-glpsktWYNqBc9H2zsd7C6TvmgWjL5Kg","level":"info","service":"tech-and-check-alerts","timestamp":"2020-05-15 14:07:02"}
{"message":"Syncing twitter accounts for \"undefined\" from Google doc: ","level":"info","service":"tech-and-check-alerts","timestamp":"2020-05-15 15:00:01"}
{"message":"Syncing twitter accounts for \"undefined\" from Google doc: ","level":"info","service":"tech-and-check-alerts","timestamp":"2020-05-15 15:00:01"}
{"message":"Syncing twitter accounts for \"undefined\" from Google doc: ","level":"info","service":"tech-and-check-alerts","timestamp":"2020-05-15 15:00:01"}
{"message":"Syncing twitter accounts for \"undefined\" from Google doc: ","level":"info","service":"tech-and-check-alerts","timestamp":"2020-05-15 15:00:02"}
{"message":"Syncing twitter accounts for \"undefined\" from Google doc: ","level":"info","service":"tech-and-check-alerts","timestamp":"2020-05-15 15:00:02"}
{"message":"Syncing twitter accounts for \"undefined\" from Google doc: ","level":"info","service":"tech-and-check-alerts","timestamp":"2020-05-15 15:00:02"}

This begs two things:

  1. We should have a check in our spreadsheet scraper to refuse to attempt to scrape if the google doc ID is empty, this will prevent DDOSing google (hopefully they didn't notice or care... but we did hit them like 30k times)

  2. We need to figure out what's happening.

As @reefdog mentioned it's possoble there is a queue retry that we didn't know was enabled; it's also possible an import is accidentally causing an execute somehow.

Will post updates when I have them, and the PR to resolve should come shortly after.

slifty commented 4 years ago

Well that was fast.

    TWITTER_ACCOUNT_LIST: 'twitterAccountStatementScraper',
    TWITTER_ACCOUNT_STATEMENT: 'twitterAccountStatementScraper',

d'oh!

We failed to change the actual value of the list scraper queue name, which meant it was being triggered every time the account scraper was triggered!