DocNow / twarc-network

Generate network visualizations from Twitter data.
MIT License
19 stars 1 forks source link

KeyError: 'author' When creating a network #1

Closed numeroteca closed 3 years ago

numeroteca commented 3 years ago

When I run twarc2 network tweets.jsonl test_network.html I get this error

Traceback (most recent call last):
  File "/home/numeroteca/.pyenv/versions/3.8.1/bin/twarc2", line 11, in <module>
    load_entry_point('twarc==2.3.1', 'console_scripts', 'twarc2')()
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/twarc_network/__init__.py", line 32, in network
    g = get_graph(infile, nodes, digraph=False)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/twarc_network/__init__.py", line 92, in get_graph
    to_user = ref['author']['username']
KeyError: 'author'

If I run twarc2 network tweets.jsonl test_network.html --nodes hashtags > network.html I get a similar error:

Traceback (most recent call last):
  File "/home/numeroteca/.pyenv/versions/3.8.1/bin/twarc2", line 11, in <module>
    load_entry_point('twarc==2.3.1', 'console_scripts', 'twarc2')()
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/twarc_network/__init__.py", line 34, in network
    g = get_graph(infile, nodes, digraph=True)
  File "/home/numeroteca/.pyenv/versions/3.8.1/lib/python3.8/site-packages/twarc_network/__init__.py", line 104, in get_graph
    hashtags = map(lambda h: h['tag'], t['entities'].get('hashtags', []))
KeyError: 'entities'

I am not sure if it has something to do with my configuration or there is a problem in the script.

igorbrigadir commented 3 years ago

What tweets does tweets.jsonl contain? I think it will only work for v2 format tweets, in case that's the issue, otherwise, the file has to contain full API responses per (with expansions)

numeroteca commented 3 years ago

I just downloaded those tweets with twarc2 search 'cifuentes' --start-time 2018-03-26 --end-time 2018-03-27 --archive > tweets.jsonl.

Example of one tweet: {"entities": {"annotations": [{"start": 32, "end": 40, "probability": 0.9615, "type": "Person", "normalized_text": "Cifuentes"}, {"start": 178, "end": 186, "probability": 0.7933, "type": "Place", "normalized_text": "Catalunya"}], "mentions": [{"start": 0, "end": 12, "username": "gallifantes", "id": "250801453", "profile_image_url": "https://pbs.twimg.com/profile_images/1336687652003864577/Vcu_2Jr2_normal.jpg", "verified": false, "name": "Cris", "protected": false, "location": "Barcelona", "description": "Procrastinadora nivel experto. \n\nAra \u00e9s dem\u00e0. No escalfa el foc d'ahir\nni el foc d'avui i haurem de fer foc nou. Mart\u00ed i Pol", "public_metrics": {"followers_count": 97705, "following_count": 1813, "tweet_count": 109902, "listed_count": 486}, "url": "", "pinned_tweet_id": "1184685312301260800", "created_at": "2011-02-11T20:53:57.000Z"}, {"start": 13, "end": 18, "username": "KRLS", "id": "11611502", "profile_image_url": "https://pbs.twimg.com/profile_images/1399368370894606336/zt5X-4S7_normal.jpg", "verified": true, "name": "Carles Puigdemont", "protected": false, "location": "Brussels, Belgium", "description": "130th President of Catalonia | President of @ConsellxRep | MEP @JuntsEU | Telegram: https://t.co/1m4VpOqgQ4 | #Mexplico \ud83d\udc49 https://t.co/2kmHCUEkaU", "entities": {"url": {"urls": [{"start": 0, "end": 23, "url": "https://t.co/n0WOMEAuN0", "expanded_url": "https://www.juntsxeuropa.cat", "display_url": "juntsxeuropa.cat"}]}, "description": {"urls": [{"start": 84, "end": 107, "url": "https://t.co/1m4VpOqgQ4", "expanded_url": "https://t.me/carlespuigdemont", "display_url": "t.me/carlespuigdemo\u2026"}, {"start": 122, "end": 145, "url": "https://t.co/2kmHCUEkaU", "expanded_url": "https://ja.cat/buDGt", "display_url": "ja.cat/buDGt"}], "hashtags": [{"start": 110, "end": 119, "tag": "Mexplico"}], "mentions": [{"start": 44, "end": 56, "username": "ConsellxRep"}, {"start": 63, "end": 71, "username": "JuntsEU"}]}}, "public_metrics": {"followers_count": 792475, "following_count": 5294, "tweet_count": 22788, "listed_count": 3161}, "url": "https://t.co/n0WOMEAuN0", "pinned_tweet_id": "1216761222999302145", "created_at": "2007-12-28T21:48:59.000Z"}]}, "reply_settings": "everyone", "text": "@gallifantes @KRLS El m\u00e1ster de Cifuentes es muy importante, la corrupci\u00f3n en la Universidad para los que la sufrimos es una causa fundamental. Y yo precisamente no me olvido de Catalunya", "id": "978252563328962561", "possibly_sensitive": false, "public_metrics": {"retweet_count": 5, "reply_count": 4, "like_count": 43, "quote_count": 0}, "created_at": "2018-03-26T12:49:20.000Z", "author_id": "2781359551", "referenced_tweets": [{"type": "replied_to", "id": "978251296443633665", "entities": {"annotations": [{"start": 95, "end": 103, "probability": 0.9091, "type": "Person", "normalized_text": "Cifuentes"}, {"start": 106, "end": 112, "probability": 0.5336, "type": "Person", "normalized_text": "Ol rait"}], "mentions": [{"start": 52, "end": 57, "username": "KRLS", "id": "11611502", "profile_image_url": "https://pbs.twimg.com/profile_images/1399368370894606336/zt5X-4S7_normal.jpg", "verified": true, "name": "Carles Puigdemont", "protected": false, "location": "Brussels, Belgium", "description": "130th President of Catalonia | President of @ConsellxRep | MEP @JuntsEU | Telegram: https://t.co/1m4VpOqgQ4 | #Mexplico \ud83d\udc49 https://t.co/2kmHCUEkaU", "entities": {"url": {"urls": [{"start": 0, "end": 23, "url": "https://t.co/n0WOMEAuN0", "expanded_url": "https://www.juntsxeuropa.cat", "display_url": "juntsxeuropa.cat"}]}, "description": {"urls": [{"start": 84, "end": 107, "url": "https://t.co/1m4VpOqgQ4", "expanded_url": "https://t.me/carlespuigdemont", "display_url": "t.me/carlespuigdemo\u2026"}, {"start": 122, "end": 145, "url": "https://t.co/2kmHCUEkaU", "expanded_url": "https://ja.cat/buDGt", "display_url": "ja.cat/buDGt"}], "hashtags": [{"start": 110, "end": 119, "tag": "Mexplico"}], "mentions": [{"start": 44, "end": 56, "username": "ConsellxRep"}, {"start": 63, "end": 71, "username": "JuntsEU"}]}}, "public_metrics": {"followers_count": 792475, "following_count": 5294, "tweet_count": 22788, "listed_count": 3161}, "url": "https://t.co/n0WOMEAuN0", "pinned_tweet_id": "1216761222999302145", "created_at": "2007-12-28T21:48:59.000Z"}]}, "reply_settings": "everyone", "text": "La izquierda alemana convocando manifestaciones por @KRLS y la de aqu\u00ed hablando del m\u00e1ster de Cifuentes. Ol rait.", "possibly_sensitive": false, "public_metrics": {"retweet_count": 3799, "reply_count": 171, "like_count": 6816, "quote_count": 45}, "created_at": "2018-03-26T12:44:18.000Z", "author_id": "250801453", "context_annotations": [{"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}], "conversation_id": "978251296443633665", "source": "Twitter for Android", "lang": "es", "author": {"profile_image_url": "https://pbs.twimg.com/profile_images/1336687652003864577/Vcu_2Jr2_normal.jpg", "verified": false, "name": "Cris", "protected": false, "location": "Barcelona", "id": "250801453", "description": "Procrastinadora nivel experto. \n\nAra \u00e9s dem\u00e0. No escalfa el foc d'ahir\nni el foc d'avui i haurem de fer foc nou. Mart\u00ed i Pol", "public_metrics": {"followers_count": 97705, "following_count": 1813, "tweet_count": 109902, "listed_count": 486}, "url": "", "pinned_tweet_id": "1184685312301260800", "username": "gallifantes", "created_at": "2011-02-11T20:53:57.000Z"}}], "context_annotations": [{"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "923913865015959552", "name": "Carles Puigdemont", "description": "Carles Puigdemont"}}], "conversation_id": "978251296443633665", "source": "Twitter for Android", "lang": "es", "in_reply_to_user_id": "250801453", "author": {"profile_image_url": "https://pbs.twimg.com/profile_images/1382070865664348166/0ME-T9l2_normal.jpg", "verified": false, "name": "Kondratio Federovich Rileev \ud83d\udc9b", "protected": false, "location": "Madrid", "id": "2781359551", "description": "#DERECHOSHUMANOS\n\nhttps://t.co/IclT1yxG22", "entities": {"url": {"urls": [{"start": 0, "end": 23, "url": "https://t.co/AOIufksbs1", "expanded_url": "http://www.ruizjimenez.es", "display_url": "ruizjimenez.es"}]}, "description": {"urls": [{"start": 18, "end": 41, "url": "https://t.co/IclT1yxG22", "expanded_url": "http://ruizjimenez.es", "display_url": "ruizjimenez.es"}], "hashtags": [{"start": 0, "end": 16, "tag": "DERECHOSHUMANOS"}]}}, "public_metrics": {"followers_count": 1900, "following_count": 1820, "tweet_count": 81013, "listed_count": 23}, "url": "https://t.co/AOIufksbs1", "pinned_tweet_id": "1382073577785147392", "username": "Marta51970", "created_at": "2014-08-31T00:19:32.000Z"}, "in_reply_to_user": {"profile_image_url": "https://pbs.twimg.com/profile_images/1336687652003864577/Vcu_2Jr2_normal.jpg", "verified": false, "name": "Cris", "protected": false, "location": "Barcelona", "id": "250801453", "description": "Procrastinadora nivel experto. \n\nAra \u00e9s dem\u00e0. No escalfa el foc d'ahir\nni el foc d'avui i haurem de fer foc nou. Mart\u00ed i Pol", "public_metrics": {"followers_count": 97705, "following_count": 1813, "tweet_count": 109902, "listed_count": 486}, "url": "", "pinned_tweet_id": "1184685312301260800", "username": "gallifantes", "created_at": "2011-02-11T20:53:57.000Z"}, "__twarc": {"url": "https://api.twitter.com/2/tweets/search/all?expansions=author_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id%2Centities.mentions.username%2Cattachments.poll_ids%2Cattachments.media_keys%2Cgeo.place_id&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cwithheld&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Ctext%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Cwithheld&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type&query=cifuentes&max_results=500&start_time=2018-03-26T00%3A00%3A00%2B00%3A00&end_time=2018-03-27T00%3A00%3A00%2B00%3A00&next_token=1jzu9lk96gu5npw44tgnlbhkvk4tayxxkdyjo10m39fh", "version": "2.3.1", "retrieved_at": "2021-06-27T20:52:54+00:00"}}

edsu commented 3 years ago

Thanks for reporting this. I forgot that I need to check for tweets that have been referenced but are no longer retrievable because they've been deleted.

edsu commented 3 years ago

@numeroteca give v0.0.2 a try pip install --upgrade twarc-network. Be careful though, that search yielded 79,671 tweets for me which renders as quite a hairball (at least the html/d3 view). You may be able to play around with --min-subgraph-size and --max-subgraph-size to make it a bit more viewable. Otherwise manipulation in Gephi or another network visualization tool will probably be needed.

edsu commented 3 years ago

Make sure you get v0.0.4. I noticed the index.html template for the HTML/D3 visualization wasn't getting bundled before, but it should be now.

numeroteca commented 3 years ago

With v0.0.4 it worked well. Thanks for the advice, I am using Gephi. The creation of the .gexf seems to be working perfectly.

edsu commented 3 years ago

Awesome! I'd be interested to see what you come up with in Gephi if you are willing to share here or over in https://app.element.io/#/room/#docnow:matrix.org

ArijRB commented 3 years ago

@edsu Hello, thank you for this cool package. I am getting the same error when I use twarc2 network tweets.jsonl --nodes tweets network.html

Traceback (most recent call last): File "/home/ariabi/.conda/envs/search-tweets/bin/twarc2", line 8, in sys.exit(twarc2()) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, **kwargs) File "/home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages/twarc_network/init.py", line 46, in network g = get_graph(infile, nodes, digraph=True) File "/home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages/twarc_network/init.py", line 142, in get_graph hashtags = map(lambda h: h["tag"], t["entities"].get("hashtags", [])) KeyError: 'entities'

edsu commented 3 years ago

Thanks for the report @ArijRB ... assuming you are running the latest version I will need to check that --nodes tweets has the same check that was implemented above.

ArijRB commented 3 years ago

Yes I am using the last version. Thank you for your quick response.

edsu commented 3 years ago

Hmm I'm having trouble reproducing this @ArijRB. Can you confirm with pip show twarc-network that you are running v0.0.4?

$ pip show twarc-network
Name: twarc-network
Version: 0.0.4
Summary: Generate network visualizations for Twitter data
Home-page: https://github.com/docnow/twarc-network
Author: Ed Summers
Author-email: ehs@pobox.com
License: UNKNOWN
Location: /home/ed/.local/lib/python3.8/site-packages
Requires: pydot, twarc, networkx
Required-by:

If you are running v0.0.4 can you maybe share your tweets dataset with me at ehs@pobox.com so I can test with it?

ArijRB commented 3 years ago

Hey, I got pip show twarc-network Name: twarc-network Version: 0.0.5 Summary: Generate network visualizations for Twitter data Home-page: https://github.com/docnow/twarc-network Author: Ed Summers Author-email: ehs@pobox.com License: UNKNOWN Location: /home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages Requires: networkx, twarc, pydot Required-by:

It works with twarc2 network tweets.jsonl network.html

edsu commented 3 years ago

You had a more recent version than the one I was testing with ! I updated my environment to use v0.0.5 as well and am able to run warc2 network tweets.jsonl --nodes tweets network.html using a file of 50,000 tweets I just collected. There must be something in your data file that is causing the problem. Would you be able to share it with me at ehs@pobox.com? I won't share it publicly and will delete it after I've finished testing.

ArijRB commented 3 years ago

Hey , I sent you the data. Thank you for your help.

edsu commented 3 years ago

I'm confused, the file you sent seemed to work fine with v0.0.5:

twarc2 network _Tornada_flatten.json --nodes tweets network.html
ArijRB commented 3 years ago

My mistake, it's --nodes hashtags that gave me the error with that file twarc2 network Tornada_flatten.json --nodes hashtags network.html Traceback (most recent call last): File "/home/ariabi/.conda/envs/search-tweets/bin/twarc2", line 8, in <module> sys.exit(twarc2()) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ariabi/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages/twarc_network/__init__.py", line 46, in network g = get_graph(infile, nodes, digraph=True) File "/home/ariabi/.conda/envs/search-tweets/lib/python3.8/site-packages/twarc_network/__init__.py", line 142, in get_graph hashtags = map(lambda h: h["tag"], t["entities"].get("hashtags", [])) KeyError: 'entities'

edsu commented 3 years ago

Ahah yes! I can replicate the error now thanks!

edsu commented 3 years ago

I just released v0.0.6 which should guard against tweets lacking an entities stanza. I guess that key is only available in the tweet dictionary when there are actual entities and the code was expecting it to always be there.

ArijRB commented 3 years ago

Cool, thank you.