digitalmethodsinitiative / dmi-tcat

Digital Methods Initiative - Twitter Capture and Analysis Toolset
Apache License 2.0
366 stars 114 forks source link

Importing json file #430

Open SimVid opened 3 years ago

SimVid commented 3 years ago

Hi folks - I'm having an issue importing a json file. I've added the name of the bin and name of the json file to the import-jsondump.php script:

// specify the name of the bin here $bin_name = 'leadershippaper'; // specify dir with the user timelines (json) $dir = 'sample'; // set type of dump ('import follow' or 'import track') $type = 'import track'; // if 'import track', specify keywords for which data was captured $queries = array();

The sample.json file is in the 'import' directory, and when I run the script this is what I get:

root@server:/var/www/dmi-tcat/import# php import-jsondump.php [debug] querybin_id = 15

Number of tweets: 0 Unique tweets: 0 Unique users: 0 Processed 0 tweets!

Total number of timelines: 0 Valid timelines: 0 Invalid timelines: 0 Populated timelines: 0 Empty timelines: 0

Help will be much appreciated.

ErikBorra commented 3 years ago

How did you retrieve that json and what format does it have? Could you provide a snippet of the json (starting at the top)?

SimVid commented 3 years ago

thanks, Erik! I didn't retrieve it myself but the explanation to the data says that Twarc was used to hydrate the retrieved tweet IDs.

Below is a snippet from the json file:

[{"created_at": "Fri Jan 24 16:24:23 +0000 2014", "id": 426752350032650240, "id_str": "426752350032650240", "full_text": "Thanks again @jcostik! The power of #opensource @askmanny @MrMikeLawson #wearenotwaiting #t1d", "truncated": false, "display_text_range": [0, 93], "entities": {"hashtags": [{"text": "opensource", "indices": [36, 47]}, {"text": "wearenotwaiting", "indices": [72, 88]}, {"text": "t1d", "indices": [89, 93]}], "symbols": [], "user_mentions": [{"screen_name": "jcostik", "name": "John Costik", "id": 71270503, "id_str": "71270503", "indices": [13, 21]}, {"screen_name": "askmanny", "name": "Manny Hernandez", "id": 5986922, "id_str": "5986922", "indices": [48, 57]}, {"screen_name": "MrMikeLawson", "name": "Mike Lawson", "id": 15053068, "id_str": "15053068", "indices": [58, 71]}], "urls": []}, "source": "<a href=\"https://tapbots.com/software/tweetbot/mac\" rel=\"nofollow\">Tweetbot for Mac", "in_reply_to_status_id": 426741704226766848, "in_reply_to_status_id_str":

SimVid commented 3 years ago

Hi again,

I keep trying and this is where I've got at: when I try to run the file covering all 4 years - it generates the following errors:

_root@server:/var/www/dmi-tcat/import# php import-jsondump.php processing ../import/attempt1.json .....................................................................................................................................2 processing ../import/trying.json PHP Warning: Invalid argument supplied for foreach() in /var/www/dmi-tcat/capture/common/functions.php on line 1771 PHP Warning: array_key_exists() expects parameter 2 to be array, null given in /var/www/dmi-tcat/capture/common/functions.php on line 1782 PHP Warning: array_key_exists() expects parameter 2 to be array, null given in /var/www/dmi-tcat/capture/common/functions.php on line 1785 PHP Warning: array_key_exists() expects parameter 2 to be array, null given in /var/www/dmi-tcat/capture/common/functions.php on line 1786 PHP Fatal error: Uncaught PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ')' at line 1 in /var/www/dmi-tcat/capture/common/functions.php:1920 Stack trace:

0 /var/www/dmi-tcat/capture/common/functions.php(1920): PDOStatement->execute()

1 /var/www/dmi-tcat/import/import-jsondump.php(69): Tweet->isInBin('attempt1')

2 /var/www/dmi-tcat/import/import-jsondump.php(45): process_json_file_timeline('../import/tryin...', Object(PDO))

3 {main}

thrown in /var/www/dmi-tcat/capture/common/functions.php on line 1920_

I tried to import a smaller file covering 3 months - it worked and this is what I've got:

_root@server:/var/www/dmi-tcat/import# php import-jsondump.php The query bin 'attempt1' already exists. Are you sure you want to add tweets to 'attempt1'? (yes/no) yes processing ../import/attempt1.json .....................................................................................................................................1 [debug] querybin_id = 24 [debug] UPDATE tcat_query_bins_periods SET starttime = :starttime, endtime = :endtime WHERE querybin_id = :querybin_id (24, 2014-01-16 19:29:38, 2018-11-07 22:12:58)

Number of tweets: 0 Unique tweets: 0 Unique users: 0 Processed 0 tweets!

Total number of timelines: 1 Valid timelines: 0 Invalid timelines: 0 Populated timelines: 0 Empty timelines: 0_

Any help will be appreciated...