Closed zhaomin1995 closed 1 year ago
What version of twarc do you have? It works in the latest one,
pip install --upgrade twarc
It worked for me:
twarc2 user name AmazonNews
Gives:
[
{
"data": [
{
"entities": {
"url": {
"urls": [
{
"start": 0,
"end": 23,
"url": "https://t.co/xOFqcYFp9O",
"expanded_url": "http://www.amazon.com/about",
"display_url": "amazon.com/about"
}
]
},
"description": {
"mentions": [
{
"start": 36,
"end": 43,
"username": "Amazon"
}
]
}
},
"profile_image_url": "https://pbs.twimg.com/profile_images/1250516096907571200/adjxWadZ_normal.jpg",
"description": "The official account for news about @Amazon.",
"name": "Amazon News",
"username": "amazonnews",
"id": "818902172347678720",
"url": "https://t.co/xOFqcYFp9O",
"protected": false,
"location": "United States",
"public_metrics": {
"followers_count": 235481,
"following_count": 887,
"tweet_count": 10068,
"listed_count": 955
},
"verified": true,
"created_at": "2017-01-10T19:27:47.000Z",
"verified_type": "business"
}
],
"__twarc": {
"url": "https://api.twitter.com/2/users/by?tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Ctext%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Cwithheld%2Cedit_controls%2Cedit_history_tweet_ids&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cverified_type%2Cwithheld&expansions=pinned_tweet_id&usernames=AmazonNews",
"version": "2.13.0",
"retrieved_at": "2023-03-29T16:53:52+00:00"
}
}
]
"verified_type": "business"
is there
@igorbrigadir it looks like @zhaomin1995 was asking about tweet data? I don't see verified_type
in the user profile when doing a twarc2 tweet 1640368077735878657
. I guess this information is only available when fetching information for a specific user?
It should still be there in the user object:
{
"data": [
{
"possibly_sensitive": false,
"edit_controls": {
"edits_remaining": 5,
"is_edit_eligible": true,
"editable_until": "2023-03-27T15:30:00.000Z"
},
"lang": "en",
"entities": {
"hashtags": [
{
"start": 37,
"end": 56,
"tag": "WomensHistoryMonth"
}
],
"annotations": [
{
"start": 70,
"end": 75,
"probability": 0.837,
"type": "Organization",
"normalized_text": "Amazon"
},
{
"start": 97,
"end": 100,
"probability": 0.9353,
"type": "Place",
"normalized_text": "Ohio"
}
],
"urls": [
{
"start": 253,
"end": 276,
"url": "https://t.co/yKArczarMp",
"expanded_url": "https://nbc24.com/news/local/women-take-over-amazon-fulfillment-center-in-rossford-for-day-of-empowerment",
"display_url": "nbc24.com/news/local/wom…",
"images": [
{
"url": "https://pbs.twimg.com/news_img/1640368124494069760/zcw6SzpK?format=jpg&name=orig",
"width": 986,
"height": 555
},
{
"url": "https://pbs.twimg.com/news_img/1640368124494069760/zcw6SzpK?format=jpg&name=150x150",
"width": 150,
"height": 150
}
],
"status": 200,
"title": "Women take over Amazon fulfillment center in Rossford for day of empowerment",
"description": "The day highlighted representation and inclusivity, inspiring employees to break the glass ceiling.",
"unwound_url": "https://nbc24.com/news/local/women-take-over-amazon-fulfillment-center-in-rossford-for-day-of-empowerment"
}
]
},
"created_at": "2023-03-27T15:00:00.000Z",
"public_metrics": {
"retweet_count": 1,
"reply_count": 10,
"like_count": 11,
"quote_count": 2,
"impression_count": 3162
},
"context_annotations": [
{
"domain": {
"id": "109",
"name": "Reoccurring Trends",
"description": "Twitter generated Trends that occur on a daily, weekly, monthly, or yearly basis such as Monday motivation."
},
"entity": {
"id": "969292478615400448",
"name": "Women's History Month",
"description": "Women's History Month"
}
},
{
"domain": {
"id": "159",
"name": "States",
"description": "States, provinces, or prefectures, like California or Fukushima Prefecture"
},
"entity": {
"id": "1010253454822895616",
"name": "Ohio",
"description": "Ohio"
}
}
],
"id": "1640368077735878657",
"author_id": "818902172347678720",
"text": "Talk about girl power. 👏 In honor of #WomensHistoryMonth, women at an Amazon facility in Toledo, Ohio, ran all the operations for a day, sporting colorful tutus and t-shirts. \n\nWhat a great way to highlight inclusivity, representation, and empowerment! https://t.co/yKArczarMp",
"conversation_id": "1640368077735878657",
"edit_history_tweet_ids": [
"1640368077735878657"
],
"reply_settings": "everyone"
}
],
"includes": {
"users": [
{
"entities": {
"url": {
"urls": [
{
"start": 0,
"end": 23,
"url": "https://t.co/xOFqcYFp9O",
"expanded_url": "http://www.amazon.com/about",
"display_url": "amazon.com/about"
}
]
},
"description": {
"mentions": [
{
"start": 36,
"end": 43,
"username": "Amazon"
}
]
}
},
"profile_image_url": "https://pbs.twimg.com/profile_images/1250516096907571200/adjxWadZ_normal.jpg",
"description": "The official account for news about @Amazon.",
"name": "Amazon News",
"username": "amazonnews",
"id": "818902172347678720",
"url": "https://t.co/xOFqcYFp9O",
"protected": false,
"location": "United States",
"public_metrics": {
"followers_count": 235505,
"following_count": 887,
"tweet_count": 10068,
"listed_count": 956
},
"verified": true,
"created_at": "2017-01-10T19:27:47.000Z",
"verified_type": "business"
}
],
"tweets": [
{
"possibly_sensitive": false,
"edit_controls": {
"edits_remaining": 5,
"is_edit_eligible": true,
"editable_until": "2023-03-27T15:30:00.000Z"
},
"lang": "en",
"entities": {
"hashtags": [
{
"start": 37,
"end": 56,
"tag": "WomensHistoryMonth"
}
],
"annotations": [
{
"start": 70,
"end": 75,
"probability": 0.837,
"type": "Organization",
"normalized_text": "Amazon"
},
{
"start": 97,
"end": 100,
"probability": 0.9353,
"type": "Place",
"normalized_text": "Ohio"
}
],
"urls": [
{
"start": 253,
"end": 276,
"url": "https://t.co/yKArczarMp",
"expanded_url": "https://nbc24.com/news/local/women-take-over-amazon-fulfillment-center-in-rossford-for-day-of-empowerment",
"display_url": "nbc24.com/news/local/wom…",
"images": [
{
"url": "https://pbs.twimg.com/news_img/1640368124494069760/zcw6SzpK?format=jpg&name=orig",
"width": 986,
"height": 555
},
{
"url": "https://pbs.twimg.com/news_img/1640368124494069760/zcw6SzpK?format=jpg&name=150x150",
"width": 150,
"height": 150
}
],
"status": 200,
"title": "Women take over Amazon fulfillment center in Rossford for day of empowerment",
"description": "The day highlighted representation and inclusivity, inspiring employees to break the glass ceiling.",
"unwound_url": "https://nbc24.com/news/local/women-take-over-amazon-fulfillment-center-in-rossford-for-day-of-empowerment"
}
]
},
"created_at": "2023-03-27T15:00:00.000Z",
"public_metrics": {
"retweet_count": 1,
"reply_count": 10,
"like_count": 11,
"quote_count": 2,
"impression_count": 3162
},
"context_annotations": [
{
"domain": {
"id": "109",
"name": "Reoccurring Trends",
"description": "Twitter generated Trends that occur on a daily, weekly, monthly, or yearly basis such as Monday motivation."
},
"entity": {
"id": "969292478615400448",
"name": "Women's History Month",
"description": "Women's History Month"
}
},
{
"domain": {
"id": "159",
"name": "States",
"description": "States, provinces, or prefectures, like California or Fukushima Prefecture"
},
"entity": {
"id": "1010253454822895616",
"name": "Ohio",
"description": "Ohio"
}
}
],
"id": "1640368077735878657",
"author_id": "818902172347678720",
"text": "Talk about girl power. 👏 In honor of #WomensHistoryMonth, women at an Amazon facility in Toledo, Ohio, ran all the operations for a day, sporting colorful tutus and t-shirts. \n\nWhat a great way to highlight inclusivity, representation, and empowerment! https://t.co/yKArczarMp",
"conversation_id": "1640368077735878657",
"edit_history_tweet_ids": [
"1640368077735878657"
],
"reply_settings": "everyone"
}
]
},
"__twarc": {
"url": "https://api.twitter.com/2/tweets?expansions=author_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id%2Centities.mentions.username%2Cattachments.poll_ids%2Cattachments.media_keys%2Cgeo.place_id%2Cedit_history_tweet_ids&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Ctext%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Cwithheld%2Cedit_controls%2Cedit_history_tweet_ids&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cverified_type%2Cwithheld&media.fields=alt_text%2Cduration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cvariants%2Cpublic_metrics&poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type&ids=1640368077735878657",
"version": "2.13.0",
"retrieved_at": "2023-03-29T23:55:49+00:00"
}
}
i get "verified_type": "business"
there too
I used the code below to fetch the tweet metadata which does not include "verified_type". Am I using twarc correctly?
t = Twarc2(bearer_token=bearer_token)
query = """(#covid OR #covid19) place_country:US has:images is:verified lang:en"""
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 2, 1, 0, 0, 0, 0, datetime.timezone.utc)
search_results = t.search_all(query=query, start_time=start_time, end_time=end_time, max_results=10, sort_order='recency')
for page in search_results:
tweets = ensure_flattened(page)
break
This code also retrieves the right fields:
from twarc.client2 import Twarc2
from twarc.expansions import ensure_flattened
import datetime
t = Twarc2(bearer_token="...")
query = """(#covid OR #covid19) place_country:US has:images is:verified lang:en"""
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 2, 1, 0, 0, 0, 0, datetime.timezone.utc)
search_results = t.search_all(query=query, start_time=start_time, end_time=end_time, max_results=10, sort_order='recency')
for page in search_results:
tweets = ensure_flattened(page)
# debug:
for tweet in tweets:
tweet_id = tweet['id']
username = tweet['author']['username']
verified = tweet['author']['verified']
verified_type = tweet['author']['verified_type']
print(f"Tweet: {tweet_id} Author: {username} Verified: {verified} Verified Type: {verified_type}")
break
Outputs:
Tweet: 1356028811876433924 Author: aartisarwal Verified: True Verified Type: blue
Tweet: 1356024850549252097 Author: aartisarwal Verified: True Verified Type: blue
Tweet: 1355973048764162054 Author: Santos4Congress Verified: True Verified Type: none
Tweet: 1355942707450040325 Author: BrigidaMack Verified: True Verified Type: none
Tweet: 1355901187694997504 Author: JohnCooper4Nash Verified: True Verified Type: none
Tweet: 1355890603129757704 Author: David_RMartinez Verified: True Verified Type: none
Tweet: 1355880450103930881 Author: DrNicoleCross Verified: True Verified Type: none
Tweet: 1355708980014637061 Author: StanleyRoberts Verified: True Verified Type: blue
Tweet: 1355689206291574785 Author: stephanielily Verified: True Verified Type: none
Tweet: 1355681305611042818 Author: QuirkSilvaCA Verified: False Verified Type: none
Note, for users like QuirkSilvaCA
that have Verified: False Verified Type: none
this is an edge case that comes up sometimes - this user WAS verified, before (@verified
follows them, but their profile changed handle or something and lost the badge, or they got Blue and hid the badge)
is:verified
in the query matches on the legacy verified field only, not on twitter Blue as far as i can tell, with some edge cases like above where accounts were formerly with a badge but lost it.
This code also retrieves the right fields:
from twarc.client2 import Twarc2 from twarc.expansions import ensure_flattened import datetime t = Twarc2(bearer_token="...") query = """(#covid OR #covid19) place_country:US has:images is:verified lang:en""" start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc) end_time = datetime.datetime(2021, 2, 1, 0, 0, 0, 0, datetime.timezone.utc) search_results = t.search_all(query=query, start_time=start_time, end_time=end_time, max_results=10, sort_order='recency') for page in search_results: tweets = ensure_flattened(page) # debug: for tweet in tweets: tweet_id = tweet['id'] username = tweet['author']['username'] verified = tweet['author']['verified'] verified_type = tweet['author']['verified_type'] print(f"Tweet: {tweet_id} Author: {username} Verified: {verified} Verified Type: {verified_type}") break
Outputs:
Tweet: 1356028811876433924 Author: aartisarwal Verified: True Verified Type: blue Tweet: 1356024850549252097 Author: aartisarwal Verified: True Verified Type: blue Tweet: 1355973048764162054 Author: Santos4Congress Verified: True Verified Type: none Tweet: 1355942707450040325 Author: BrigidaMack Verified: True Verified Type: none Tweet: 1355901187694997504 Author: JohnCooper4Nash Verified: True Verified Type: none Tweet: 1355890603129757704 Author: David_RMartinez Verified: True Verified Type: none Tweet: 1355880450103930881 Author: DrNicoleCross Verified: True Verified Type: none Tweet: 1355708980014637061 Author: StanleyRoberts Verified: True Verified Type: blue Tweet: 1355689206291574785 Author: stephanielily Verified: True Verified Type: none Tweet: 1355681305611042818 Author: QuirkSilvaCA Verified: False Verified Type: none
Note, for users like
QuirkSilvaCA
that haveVerified: False Verified Type: none
this is an edge case that comes up sometimes - this user WAS verified, before (@verified
follows them, but their profile changed handle or something and lost the badge, or they got Blue and hid the badge)
is:verified
in the query matches on the legacy verified field only, not on twitter Blue as far as i can tell, with some edge cases like above where accounts were formerly with a badge but lost it.
Hi,
Thanks for your help. I copied and run your code. But it says there is no verified_type
in the author field. I am using version 2.13.0 of twarc. I am not sure if I am the only one who cannot get verified_type
in the tweet metadata.
But it says there is no verified_type in the author field
Note that Verified Type: none
is not "missing" it just means the account is not Twitter Blue verified.
But is there an error in python? What's the full stack trace? And is the version of twarc in the environment definitely the same as the command line? It would help to paste in the exact python error you get.
But it says there is no verified_type in the author field
Note that
Verified Type: none
is not "missing" it just means the account is not Twitter Blue verified.But is there an error in python? What's the full stack trace? And is the version of twarc in the environment definitely the same as the command line? It would help to paste in the exact python error you get.
Hi, thanks for your response. I attached the full code and stack trace below.
What's the output of
!pip list
In a cell?
What's the output of
!pip list
In a cell?
Package Version
-------------------- ------------
aiohttp 3.8.1
aiosignal 1.2.0
argon2-cffi 20.1.0
async-generator 1.10
async-timeout 4.0.2
attrs 21.2.0
backcall 0.2.0
beautifulsoup4 4.10.0
bleach 3.3.1
blis 0.7.5
Brotli 1.0.9
cachetools 5.0.0
catalogue 2.0.6
certifi 2021.10.8
cffi 1.14.6
charset-normalizer 2.0.9
click 8.0.3
click-config-file 0.6.0
click-plugins 1.1.1
clip 1.0
colorama 0.4.4
configobj 5.0.8
conllu 4.4.1
craft-text-detector 0.4.2
cycler 0.11.0
cymem 2.0.6
datasets 1.18.3
debugpy 1.4.1
decorator 5.0.9
defusedxml 0.7.1
dill 0.3.4
efficientnet-pytorch 0.7.1
emoji 2.2.0
en-core-web-sm 3.2.0
entrypoints 0.3
filelock 3.4.2
firebase 3.0.1
fonttools 4.29.1
frozenlist 1.3.0
fsspec 2022.1.0
ftfy 6.1.1
gdown 4.3.0
huggingface-hub 0.4.0
humanize 4.4.0
idna 3.3
instaloader 4.9.5
ipykernel 6.0.3
ipython 7.25.0
ipython-genutils 0.2.0
ipywidgets 7.6.5
jedi 0.18.0
Jinja2 3.0.1
joblib 1.0.1
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 6.1.12
jupyter-console 6.4.2
jupyter-core 4.7.1
jupyter-http-over-ws 0.0.8
jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.2
kiwisolver 1.3.2
krippendorff 0.5.1
langcodes 3.3.0
MarkupSafe 2.0.1
matplotlib 3.5.1
matplotlib-inline 0.1.2
mistune 0.8.4
multidict 6.0.2
multiprocess 0.70.12.2
murmurhash 1.0.6
mutagen 1.45.1
nbclient 0.5.3
nbconvert 6.1.0
nbformat 5.1.3
nest-asyncio 1.5.1
notebook 6.4.0
numpy 1.21.1
nvidia-ml-py 11.450.51
nvitop 0.5.2.2
oauthlib 3.2.2
opencv-python 4.5.4.60
packaging 21.0
pandas 1.3.1
pandocfilters 1.4.3
parso 0.8.2
pathy 0.6.1
patsy 0.5.1
pickleshare 0.7.5
Pillow 9.0.1
pip 23.0.1
preshed 3.0.6
prometheus-client 0.11.0
prompt-toolkit 3.0.19
psutil 5.9.0
pyarrow 7.0.0
pycparser 2.20
pycryptodomex 3.14.1
pydantic 1.8.2
Pygments 2.9.0
pyparsing 2.4.7
pyrsistent 0.18.0
PySocks 1.7.1
python-dateutil 2.8.2
python-swiftclient 4.2.0
pytz 2021.1
pywin32 301
pywinpty 1.1.3
PyYAML 6.0
pyzmq 22.1.0
qtconsole 5.3.0
QtPy 2.0.1
regex 2022.1.18
requests 2.26.0
requests-oauthlib 1.3.1
sacremoses 0.0.47
scikit-learn 0.24.2
scipy 1.7.1
sec-api 1.0.15
Send2Trash 1.7.1
sentencepiece 0.1.96
seqeval 1.2.2
setuptools 56.0.0
six 1.16.0
smart-open 5.2.1
soupsieve 2.3.1
spacy 3.2.1
spacy-legacy 3.0.8
spacy-loggers 1.0.1
srsly 2.4.2
statsmodels 0.12.2
temporal-taggers 0.0.1
termcolor 1.1.0
terminado 0.10.1
testpath 0.5.0
thinc 8.0.13
threadpoolctl 2.2.0
tokenizers 0.11.4
torch 1.10.2+cu113
torchaudio 0.10.2+cu113
torchvision 0.11.3+cu113
tornado 6.1
tqdm 4.62.3
traitlets 5.0.5
transformers 4.16.2
twarc 2.13.0
typer 0.4.0
typing_extensions 4.0.1
urllib3 1.26.7
wasabi 0.9.0
wcwidth 0.2.5
webencodings 0.5.1
websockets 10.2
wheel 0.37.1
widgetsnbextension 3.5.2
wikipedia 1.4.0
windows-curses 2.3.0
xxhash 2.0.2
yarl 1.7.2
yt-dlp 2022.3.8.2
Unfortunately i can't reproduce this at all.
What does this give you?
from twarc.client2 import Twarc2
from twarc.expansions import ensure_flattened
import datetime
import json
t = Twarc2(bearer_token="...")
query = """(#covid OR #covid19) place_country:US has:images is:verified lang:en"""
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc)
end_time = datetime.datetime(2021, 2, 1, 0, 0, 0, 0, datetime.timezone.utc)
search_results = t.search_all(query=query, start_time=start_time, end_time=end_time, max_results=10, sort_order='recency')
for page in search_results:
print(json.dumps(page))
tweets = ensure_flattened(page)
# debug:
for tweet in tweets:
tweet_id = tweet['id']
username = tweet['author']['username']
verified = tweet['author']['verified']
verified_type = tweet['author']['verified_type']
print(f"Tweet: {tweet_id} Author: {username} Verified: {verified} Verified Type: {verified_type}")
break
(this just outputs the full page of the response as json)
Unfortunately i can't reproduce this at all.
What does this give you?
from twarc.client2 import Twarc2 from twarc.expansions import ensure_flattened import datetime import json t = Twarc2(bearer_token="...") query = """(#covid OR #covid19) place_country:US has:images is:verified lang:en""" start_time = datetime.datetime(2021, 1, 1, 0, 0, 0, 0, datetime.timezone.utc) end_time = datetime.datetime(2021, 2, 1, 0, 0, 0, 0, datetime.timezone.utc) search_results = t.search_all(query=query, start_time=start_time, end_time=end_time, max_results=10, sort_order='recency') for page in search_results: print(json.dumps(page)) tweets = ensure_flattened(page) # debug: for tweet in tweets: tweet_id = tweet['id'] username = tweet['author']['username'] verified = tweet['author']['verified'] verified_type = tweet['author']['verified_type'] print(f"Tweet: {tweet_id} Author: {username} Verified: {verified} Verified Type: {verified_type}") break
(this just outputs the full page of the response as json)
I saved the output in the json file. Link is here.
@igorbrigadir - looks like there hasn't been a release since the field was added, I think that's the issue.
AH! you're right! My mistake - it was my messed up environments after all.
The latest 2.13.0 release indeed does not have that commit https://github.com/DocNow/twarc/blame/6d9b68f227ebadfe6e4b8d70bb951ea7cf78ba31/twarc/expansions.py#L43
@zhaomin1995 To fix this, the quickest way is to install directly from the latest main
:
pip install --force https://github.com/DocNow/twarc/archive/main.zip
that should fix it! But we should release a new fix version on pypi - i don't think i have access to do that so @edsu might have to release
@igorbrigadir you are listed as a maintainer of twarc on pypi, so you should have permission to do this?
AH! you're right! My mistake - it was my messed up environments after all.
The latest 2.13.0 release indeed does not have that commit https://github.com/DocNow/twarc/blame/6d9b68f227ebadfe6e4b8d70bb951ea7cf78ba31/twarc/expansions.py#L43
@zhaomin1995 To fix this, the quickest way is to install directly from the latest
main
:pip install --force https://github.com/DocNow/twarc/archive/main.zip
that should fix it! But we should release a new fix version on pypi - i don't think i have access to do that so @edsu might have to release
Thanks for your response! I will wait for the latest release for this issue 😊
Should be all done now! v2.14.0 is the latest version and has the extra fields for v1.1 and v2
pip install --upgrade twarc
to update
Should be all done now! v2.14.0 is the latest version and has the extra fields for v1.1 and v2
pip install --upgrade twarc
to update
Thank you! I just upgraded twarc. And yes, I can see verified_type
now.
The newly accessible field
user_verified_type
cannot be found in the tweet metadata that I fetched via twarc2.