Closed numeroteca closed 3 years ago
What tweets does tweets.jsonl
contain? I think it will only work for v2 format tweets, in case that's the issue, otherwise, the file has to contain full API responses per (with expansions)
I just downloaded those tweets with twarc2 search 'cifuentes' --start-time 2018-03-26 --end-time 2018-03-27 --archive > tweets.jsonl
.
Example of one tweet:
{"entities": {"annotations": [{"start": 32, "end": 40, "probability": 0.9615, "type": "Person", "normalized_text": "Cifuentes"}, {"start": 178, "end": 186, "probability": 0.7933, "type": "Place", "normalized_text": "Catalunya"}], "mentions": [{"start": 0, "end": 12, "username": "gallifantes", "id": "250801453", "profile_image_url": "https://pbs.twimg.com/profile_images/1336687652003864577/Vcu_2Jr2_normal.jpg", "verified": false, "name": "Cris", "protected": false, "location": "Barcelona", "description": "Procrastinadora nivel experto. \n\nAra \u00e9s dem\u00e0. No escalfa el foc d'ahir\nni el foc d'avui i haurem de fer foc nou. Mart\u00ed i Pol", "public_metrics": {"followers_count": 97705, "following_count": 1813, "tweet_count": 109902, "listed_count": 486}, "url": "", "pinned_tweet_id": "1184685312301260800", "created_at": "2011-02-11T20:53:57.000Z"}, {"start": 13, "end": 18, "username": "KRLS", "id": "11611502", "profile_image_url": "https://pbs.twimg.com/profile_images/1399368370894606336/zt5X-4S7_normal.jpg", "verified": true, "name": "Carles Puigdemont", "protected": false, "location": "Brussels, Belgium", "description": "130th President of Catalonia | President of @ConsellxRep | MEP @JuntsEU | Telegram: https://t.co/1m4VpOqgQ4 | #Mexplico \ud83d\udc49 https://t.co/2kmHCUEkaU", "entities": {"url": {"urls": [{"start": 0, "end": 23, "url": "https://t.co/n0WOMEAuN0", "expanded_url": "https://www.juntsxeuropa.cat", "display_url": "juntsxeuropa.cat"}]}, "description": {"urls": [{"start": 84, "end": 107, "url": "https://t.co/1m4VpOqgQ4", "expanded_url": "https://t.me/carlespuigdemont", "display_url": "t.me/carlespuigdemo\u2026"}, {"start": 122, "end": 145, "url": "https://t.co/2kmHCUEkaU", "expanded_url": "https://ja.cat/buDGt", "display_url": "ja.cat/buDGt"}], "hashtags": [{"start": 110, "end": 119, "tag": "Mexplico"}], "mentions": [{"start": 44, "end": 56, "username": "ConsellxRep"}, {"start": 63, "end": 71, "username": "JuntsEU"}]}}, "public_metrics": {"followers_count": 792475, "following_count": 5294, "tweet_count": 22788, "listed_count": 3161}, "url": "https://t.co/n0WOMEAuN0", "pinned_tweet_id": "1216761222999302145", "created_at": "2007-12-28T21:48:59.000Z"}]}, "reply_settings": "everyone", "text": "@gallifantes @KRLS El m\u00e1ster de Cifuentes es muy importante, la corrupci\u00f3n en la Universidad para los que la sufrimos es una causa fundamental. Y yo precisamente no me olvido de Catalunya", "id": "978252563328962561", "possibly_sensitive": false, "public_metrics": {"retweet_count": 5, "reply_count": 4, "like_count": 43, "quote_count": 0}, "created_at": "2018-03-26T12:49:20.000Z", "author_id": "2781359551", "referenced_tweets": [{"type": "replied_to", "id": "978251296443633665", "entities": {"annotations": [{"start": 95, "end": 103, "probability": 0.9091, "type": "Person", "normalized_text": "Cifuentes"}, {"start": 106, "end": 112, "probability": 0.5336, "type": "Person", "normalized_text": "Ol rait"}], "mentions": [{"start": 52, "end": 57, "username": "KRLS", "id": "11611502", "profile_image_url": "https://pbs.twimg.com/profile_images/1399368370894606336/zt5X-4S7_normal.jpg", "verified": true, "name": "Carles Puigdemont", "protected": false, "location": "Brussels, Belgium", "description": "130th President of Catalonia | President of @ConsellxRep | MEP @JuntsEU | Telegram: https://t.co/1m4VpOqgQ4 | #Mexplico \ud83d\udc49 https://t.co/2kmHCUEkaU", "entities": {"url": {"urls": [{"start": 0, "end": 23, "url": "https://t.co/n0WOMEAuN0", "expanded_url": "https://www.juntsxeuropa.cat", "display_url": "juntsxeuropa.cat"}]}, "description": {"urls": [{"start": 84, "end": 107, "url": "https://t.co/1m4VpOqgQ4", "expanded_url": "https://t.me/carlespuigdemont", "display_url": "t.me/carlespuigdemo\u2026"}, {"start": 122, "end": 145, "url": "https://t.co/2kmHCUEkaU", "expanded_url": "https://ja.cat/buDGt", "display_url": "ja.cat/buDGt"}], "hashtags": [{"start": 110, "end": 119, "tag": "Mexplico"}], "mentions": [{"start": 44, "end": 56, "username": "ConsellxRep"}, {"start": 63, "end": 71, "username": "JuntsEU"}]}}, "public_metrics": {"followers_count": 792475, "following_count": 5294, "tweet_count": 22788, "listed_count": 3161}, "url": "https://t.co/n0WOMEAuN0", "pinned_tweet_id": "1216761222999302145", "created_at": "2007-12-28T21:48:59.000Z"}]}, "reply_settings": "everyone", "text": "La izquierda alemana convocando manifestaciones por @KRLS y la de aqu\u00ed hablando del m\u00e1ster de Cifuentes. Ol rait.", "possibly_sensitive": false, "public_metrics": {"retweet_count": 3799, "reply_count": 171, "like_count": 6816, "quote_count": 45}, "created_at": "2018-03-26T12:44:18.000Z", "author_id": "250801453", "context_annotations": [{"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}], "conversation_id": "978251296443633665", "source": "Twitter for Android", "lang": "es", "author": {"profile_image_url": "https://pbs.twimg.com/profile_images/1336687652003864577/Vcu_2Jr2_normal.jpg", "verified": false, "name": "Cris", "protected": false, "location": "Barcelona", "id": "250801453", "description": "Procrastinadora nivel experto. \n\nAra \u00e9s dem\u00e0. No escalfa el foc d'ahir\nni el foc d'avui i haurem de fer foc nou. Mart\u00ed i Pol", "public_metrics": {"followers_count": 97705, "following_count": 1813, "tweet_count": 109902, "listed_count": 486}, "url": "", "pinned_tweet_id": "1184685312301260800", "username": "gallifantes", "created_at": "2011-02-11T20:53:57.000Z"}}], "context_annotations": [{"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}], "conversation_id": "978251296443633665", "source": "Twitter for Android", "lang": "es", "in_reply_to_user_id": "250801453", "author": {"profile_image_url": "https://pbs.twimg.com/profile_images/1382070865664348166/0ME-T9l2_normal.jpg", "verified": false, "name": "Kondratio Federovich Rileev \ud83d\udc9b", "protected": false, "location": "Madrid", "id": "2781359551", "description": "#DERECHOSHUMANOS\n\nhttps://t.co/IclT1yxG22", "entities": {"url": {"urls": [{"start": 0, "end": 23, "url": "https://t.co/AOIufksbs1", "expanded_url": "http://www.ruizjimenez.es", "display_url": "ruizjimenez.es"}]}, "description": {"urls": [{"start": 18, "end": 41, "url": "https://t.co/IclT1yxG22", "expanded_url": "http://ruizjimenez.es", "display_url": "ruizjimenez.es"}], "hashtags": [{"start": 0, "end": 16, "tag": "DERECHOSHUMANOS"}]}}, "public_metrics": {"followers_count": 1900, "following_count": 1820, "tweet_count": 81013, "listed_count": 23}, "url": "https://t.co/AOIufksbs1", "pinned_tweet_id": "1382073577785147392", "username": "Marta51970", "created_at": "2014-08-31T00:19:32.000Z"}, "in_reply_to_user": {"profile_image_url": "https://pbs.twimg.com/profile_images/1336687652003864577/Vcu_2Jr2_normal.jpg", "verified": false, "name": "Cris", "protected": false, "location": "Barcelona", "id": "250801453", "description": "Procrastinadora nivel experto. \n\nAra \u00e9s dem\u00e0. No escalfa el foc d'ahir\nni el foc d'avui i haurem de fer foc nou. Mart\u00ed i Pol", "public_metrics": {"followers_count": 97705, "following_count": 1813, "tweet_count": 109902, "listed_count": 486}, "url": "", "pinned_tweet_id": "1184685312301260800", "username": "gallifantes", "created_at": "2011-02-11T20:53:57.000Z"}, "__twarc": {"url": "https://api.twitter.com/2/tweets/search/all?expansions=author_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id%2Centities.mentions.username%2Cattachments.poll_ids%2Cattachments.media_keys%2Cgeo.place_id&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cwithheld&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Ctext%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Cwithheld&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type&query=cifuentes&max_results=500&start_time=2018-03-26T00%3A00%3A00%2B00%3A00&end_time=2018-03-27T00%3A00%3A00%2B00%3A00&next_token=1jzu9lk96gu5npw44tgnlbhkvk4tayxxkdyjo10m39fh", "version": "2.3.1", "retrieved_at": "2021-06-27T20:52:54+00:00"}}
Thanks for reporting this. I forgot that I need to check for tweets that have been referenced but are no longer retrievable because they've been deleted.
@numeroteca give v0.0.2 a try pip install --upgrade twarc-network
. Be careful though, that search yielded 79,671 tweets for me which renders as quite a hairball (at least the html/d3 view). You may be able to play around with --min-subgraph-size and --max-subgraph-size to make it a bit more viewable. Otherwise manipulation in Gephi or another network visualization tool will probably be needed.
Make sure you get v0.0.4. I noticed the index.html template for the HTML/D3 visualization wasn't getting bundled before, but it should be now.
With v0.0.4 it worked well. Thanks for the advice, I am using Gephi. The creation of the .gexf seems to be working perfectly.
Awesome! I'd be interested to see what you come up with in Gephi if you are willing to share here or over in https://app.element.io/#/room/#docnow:matrix.org
@edsu Hello, thank you for this cool package. I am getting the same error when I use twarc2 network tweets.jsonl --nodes tweets network.html
Traceback (most recent call last):
File "/home/ariabi/.conda/envs/search-tweets/bin/twarc2", line 8, in
Thanks for the report @ArijRB ... assuming you are running the latest version I will need to check that --nodes tweets
has the same check that was implemented above.
Yes I am using the last version. Thank you for your quick response.
Hmm I'm having trouble reproducing this @ArijRB. Can you confirm with pip show twarc-network
that you are running v0.0.4?
$ pip show twarc-network
Name: twarc-network
Version: 0.0.4
Summary: Generate network visualizations for Twitter data
Home-page: https://github.com/docnow/twarc-network
Author: Ed Summers
Author-email: ehs@pobox.com
License: UNKNOWN
Location: /home/ed/.local/lib/python3.8/site-packages
Requires: pydot, twarc, networkx
Required-by:
If you are running v0.0.4 can you maybe share your tweets dataset with me at ehs@pobox.com so I can test with it?
Hey, I got pip show twarc-network Name: twarc-network Version: 0.0.5 Summary: Generate network visualizations for Twitter data Home-page: https://github.com/docnow/twarc-network Author: Ed Summers Author-email: ehs@pobox.com License: UNKNOWN Location: /home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages Requires: networkx, twarc, pydot Required-by:
It works with twarc2 network tweets.jsonl network.html
You had a more recent version than the one I was testing with ! I updated my environment to use v0.0.5 as well and am able to run warc2 network tweets.jsonl --nodes tweets network.html
using a file of 50,000 tweets I just collected. There must be something in your data file that is causing the problem. Would you be able to share it with me at ehs@pobox.com? I won't share it publicly and will delete it after I've finished testing.
Hey , I sent you the data. Thank you for your help.
I'm confused, the file you sent seemed to work fine with v0.0.5:
twarc2 network _Tornada_flatten.json --nodes tweets network.html
My mistake, it's --nodes hashtags that gave me the error with that file
twarc2 network Tornada_flatten.json --nodes hashtags network.html Traceback (most recent call last): File "/home/ariabi/.conda/envs/search-tweets/bin/twarc2", line 8, in <module> sys.exit(twarc2()) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages/twarc_network/__init__.py", line 46, in network g = get_graph(infile, nodes, digraph=True) File "/home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages/twarc_network/__init__.py", line 142, in get_graph hashtags = map(lambda h: h["tag"], t["entities"].get("hashtags", [])) KeyError: 'entities'
Ahah yes! I can replicate the error now thanks!
I just released v0.0.6 which should guard against tweets lacking an entities
stanza. I guess that key is only available in the tweet dictionary when there are actual entities and the code was expecting it to always be there.
Cool, thank you.
When I run
twarc2 network tweets.jsonl test_network.html
I get this errorIf I run
twarc2 network tweets.jsonl test_network.html --nodes hashtags > network.html
I get a similar error:I am not sure if it has something to do with my configuration or there is a problem in the script.