digitalmethodsinitiative / dmi-tcat

Digital Methods Initiative - Twitter Capture and Analysis Toolset
Apache License 2.0
367 stars 114 forks source link

Upgrade stops #349

Closed marciofoz closed 5 years ago

marciofoz commented 5 years ago

Hi!

Usually after one hour running the upgrade script, these errors appear:

$php upgrade.php ... ... 2019-02-05 19:24:52 Twitter REST failure with code 2019-02-05 19:24:52 The error may not be permanent; we will sleep and retry the request. 2019-02-05 19:24:59 Reconnecting to API. 2019-02-05 19:24:59 current key 0 ratefree 860 2019-02-05 19:25:10 Twitter REST failure with code 2019-02-05 19:25:10 The error may not be permanent; we will sleep and retry the request. 2019-02-05 19:25:17 Reconnecting to API. 2019-02-05 19:25:17 current key 0 ratefree 859 2019-02-05 19:25:27 Twitter REST failure with code 2019-02-05 19:25:27 The error may not be permanent; we will sleep and retry the request. 2019-02-05 19:25:34 Reconnecting to API. 2019-02-05 19:25:34 current key 0 ratefree 858 2019-02-05 19:25:44 Twitter REST failure with code 2019-02-05 19:25:44 The error may not be permanent; we will sleep and retry the request. 2019-02-05 19:25:51 Reconnecting to API. 2019-02-05 19:25:51 current key 0 ratefree 857 2019-02-05 19:26:01 Permanent error when querying the Twitter API. Please investigate the error output. Now stopping. 2019-02-05 19:26:01 Enabling keys for bin abc123 2019-02-05 19:26:19 Bin TESTETI may need updating in relation to extended entities. Our evidence is tweet id 1088481091139166209 with missing hashtag 'ลูคัส' 2019-02-05 19:26:19 Starting work on TESTETI 2019-02-05 19:26:19 Disabling keys for bin TESTETI 2019-02-05 19:26:19 Preparing list of candidate tweets 2019-02-05 19:26:19 performing lookup for 3000 tweets 2019-02-05 19:26:30 Warning: API key 0 got response code 0 2019-02-05 19:29:40 Warning: API key 0 got response code 0 2019-02-05 19:32:51 Warning: API key 0 got response code 0 2019-02-05 19:36:01 Warning: API key 0 got response code 0 2019-02-05 19:39:11 Warning: API key 0 got response code 0 2019-02-05 19:42:21 Warning: API key 0 got response code 0 2019-02-05 19:45:32 Warning: API key 0 got response code 0

This occurs in large tables, then need abort the script and run it again. I believe that in this way the upgrade process will never complete properly. Thanks

dentoir commented 5 years ago

Hi @marciofoz

This is a recurring problem for large tables, related both to the database storage technology (UPDATE statements take too long for huge bins) and because the REST API is used in this upgrade step. I've had to disable this particular upgrade step for those bins. I'd avise running the following MySQL query inside a screen on your server to optimize the bin again. This will block capture for the duration of the queries. I'm not sure how big your bin is. If it is more than 20 million tweets these statements will also take a long time. Hopefully, switching to a new database engine in the future will solve these issues.

ALTER TABLE abc123_tweets ENABLE KEYS;
OPTIMIZE TABLE abc123_tweets;
ALTER TABLE abc123_hashtags ENABLE KEYS;
OPTIMIZE TABLE abc123_hashtags;
ALTER TABLE abc123_mentions ENABLE KEYS;
OPTIMIZE TABLE abc123_mentions;
ALTER TABLE abc123_urls ENABLE KEYS;
OPTIMIZE TABLE abc123_urls;
ALTER TABLE abc123_media ENABLE KEYS;
OPTIMIZE TABLE abc123_media;
ALTER TABLE abc123_places ENABLE KEYS;
OPTIMIZE TABLE abc123_places;
ALTER TABLE abc123_withheld ENABLE KEYS;
OPTIMIZE TABLE abc123_withheld;