behavioral-ds / evently

evently: simulation, fitting of Hawkes processes
https://www.behavioral-ds.science/evently
MIT License
16 stars 2 forks source link

Error processing json (v2 api) missing value where TRUE/FALSE needed #14

Open monanasery opened 2 years ago

monanasery commented 2 years ago

@qykong , @andrei-rizoiu: I installed the evently library and it works with the sample data you provided on Github. However, when I use evently with my Twitter json (v2 API), it gives me the following error:

Error in melt_snowflake(id) : is.character(snowflake_id) || bit64::is.integer64(snowflake_id) is not TRUE In addition: There were 28 warnings (use warnings() to see them)

and here is one of the warnings (other warnings are the same)

Warning messages: 1: In value[3L] : Error processing json: Error in if (!is.null(json_tweet$data$referenced_tweets) && json_tweet$data$referenced_tweets$type == : missing value where TRUE/FALSE needed

I don't know why the retweet ids are null. I checked my json file and searched for retweeted. I see the path (json_tweet$data$referenced_tweets$type) is correct. Can you please help me with this?

Below is a subset of the data (only part of the first line of the json file I used):

{"data": [{"referenced_tweets": [{"type": "retweeted", "id": "1253739069273710594"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}, {"start": 18, "end": 24, "username": "AC360", "id": "227837742"}], "annotations": [{"start": 25, "end": 39, "probability": 0.7096, "type": "Person", "normalized_text": "President Trump"}], "urls": [{"start": 98, "end": 121, "url": "", "expanded_url": "", "display_url": "", "images": [{"url": "", "width": 144, "height": 144}, {"url": "", "width": 144, "height": 144}], "status": 200, "title": "Ultraviolet Irradiation of Blood: \u201cThe Cure That Time Forgot\u201d?", "description": "Ultraviolet blood irradiation (UBI) was extensively used in the 1940s and 1950s to treat many diseases including septicemia, pneumonia, tuberculosis, arthritis, asthma and even poliomyelitis. The early studies were carried out by several physicians in ...", "unwound_url": ""}]}, "public_metrics": {"retweet_count": 3, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253834847258370048", "context_annotations": [{"domain": {"id": "3", "name": "TV Shows", "description": "Television shows from around the world"}, "entity": {"id": "10000271509", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}], "created_at": "2020-04-24T23:54:57.000Z", "author_id": "1890848160", "text": "RT @warriors_mom: @AC360 President Trump was referring to this well-documented medical treatment: ", "source": "Twitter for iPhone", "conversation_id": "1253834847258370048"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253452455540666371"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 24, "end": 27, "probability": 0.691, "type": "Place", "normalized_text": "U.S."}]}, "public_metrics": {"retweet_count": 5, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253828982413410307", "context_annotations": [{"domain": {"id": "123", "name": "Ongoing News Story", "description": "Ongoing News Stories like 'Brexit'"}, "entity": {"id": "1220701888179359745", "name": "COVID-19"}}], "created_at": "2020-04-24T23:31:39.000Z", "author_id": "863857568", "text": "RT @warriors_mom: Major U.S. credit-card issuers begin lowering customer spending limits as coronavirus pandemic shutdowns leave millions j\u2026", "source": "Twitter for iPhone", "conversation_id": "1253828982413410307"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253815956662620163"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}, {"start": 18, "end": 32, "username": "RealMattCouch", "id": "601535938"}], "annotations": [{"start": 33, "end": 41, "probability": 0.8682, "type": "Person", "normalized_text": "Seth Rich"}]}, "public_metrics": {"retweet_count": 2, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253816055161651202", "created_at": "2020-04-24T22:40:16.000Z", "author_id": "1065308069645754368", "text": "RT @warriors_mom: @RealMattCouch Seth Rich", "source": "Twitter for Android", "conversation_id": "1253816055161651202"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253811776103333890"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 63, "end": 67, "probability": 0.9967, "type": "Person", "normalized_text": "Trump"}, {"start": 69, "end": 74, "probability": 0.9523, "type": "Place", "normalized_text": "Russia"}, {"start": 87, "end": 95, "probability": 0.8678, "type": "Organization", "normalized_text": "Alfa Bank"}]}, "public_metrics": {"retweet_count": 1, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253812582806216704", "context_annotations": [{"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "30", "name": "Entities [Entity Service]", "description": "Entity Service top level domain, every item that is in Entity Service should be in this domain"}, "entity": {"id": "848920371311001600", "name": "Technology", "description": "Technology and computing"}}, {"domain": {"id": "30", "name": "Entities [Entity Service]", "description": "Entity Service top level domain, every item that is in Entity Service should be in this domain"}, "entity": {"id": "898650876658634752", "name": "Cybersecurity", "description": "Cybersecurity"}}], "created_at": "2020-04-24T22:26:29.000Z", "author_id": "987931361963950080", "text": "RT @warriors_mom: Top cyber security team finds no evidence of Trump-Russia chatter on Alfa Bank server: A cyber security report debunks th\u2026", "source": "Twitter for Android", "conversation_id": "1253812582806216704"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253461793168674821"}], "attachments": {"media_keys": ["3_1253461775980339201", "3_1253461780254392326", "3_1253461784981377024", "3_1253461788408102912"]}, "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "hashtags": [{"start": 23, "end": 32, "tag": "FakeNews"}], "urls": [{"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461775980339201"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461780254392326"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461784981377024"}, {"start": 56, "end": 79, "url": "", "expanded_url": "", "display_url": "pic.twitter.com/po6BRVf2pu", "media_key": "3_1253461788408102912"}]}, "public_metrics": {"retweet_count": 6, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253787731517476866", "created_at": "2020-04-24T20:47:44.000Z", "author_id": "461486301", "text": "RT @warriors_mom: Dear #FakeNews Media... seriously? \ud83d\ude44\ud83e\udd23 ", "source": "Twitter Web App", "conversation_id": "1253787731517476866"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253348491805577216"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 18, "end": 23, "probability": 0.868, "type": "Organization", "normalized_text": "Amazon"}]}, "public_metrics": {"retweet_count": 2, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253787648180789253", "context_annotations": [{"domain": {"id": "45", "name": "Brand Vertical", "description": "Top level entities that describe a Brands industry"}, "entity": {"id": "781974596706635776", "name": "Retail"}}, {"domain": {"id": "46", "name": "Brand Category", "description": "Categories within Brand Verticals that narrow down the scope of Brands"}, "entity": {"id": "783335558466506752", "name": "Online"}}, {"domain": {"id": "47", "name": "Brand", "description": "Brands and Companies"}, "entity": {"id": "10026792024", "name": "Amazon"}}], "created_at": "2020-04-24T20:47:24.000Z", "author_id": "3003997593", "text": "RT @warriors_mom: Amazon Scooped Up Data From Its Own Sellers to Launch Competing Products: Contrary to assertions to Congress, employees o\u2026", "source": "Twitter for iPhone", "conversation_id": "1253787648180789253"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253716749817729025"}], ....}

qykong commented 2 years ago

Hi Mona, thanks for your interests in our tool. I had a look at your provided data and found that it is in a different format from the data I got with V2 API. For example, each line in your jsonl file contains multiple tweets all in the 'data'. Admittedly, I built this feature last year and the API might have been updated since then. I'm wondering which tool did you use to collect these tweets? If it's a commonly employed one, I'll try to fix evently for the issue.

For a temporary workaround, you might need to change the tweet.R file to account for these differences, i.e. looping through the tweets in each line and map the corresponding fields to the ones required for building cascades.