echen102 / COVID-19-TweetIDs

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020.
Other
713 stars 308 forks source link

Does the provided hydrate.py script still work? #43

Closed jermp closed 1 month ago

jermp commented 1 month ago

Hello, as per the title, I wonder if the provided script still works now that Twitter has become "X" (and its API has changed as well, maybe?).

I configured twarc correctly with my consumer_key, consumer_secret, access_token and access_token_secret, but I get the following error:

  File "/Users/fulgor/Desktop/tweets-codiv/hydrate.py", line 82, in <module>
    main()
  File "/Users/fulgor/Desktop/tweets-codiv/hydrate.py", line 43, in main
    hydrate(path)
  File "/Users/fulgor/Desktop/tweets-codiv/hydrate.py", line 75, in hydrate
    for tweet in twarc.hydrate(id_file.open()):
  File "/Users/fulgor/Desktop/tweets-codiv/env/lib/python3.12/site-packages/twarc/client.py", line 641, in hydrate
    resp = self.post(
           ^^^^^^^^^^
  File "/Users/fulgor/Desktop/tweets-codiv/env/lib/python3.12/site-packages/twarc/decorators.py", line 88, in new_f
    resp.raise_for_status()
  File "/Users/fulgor/Desktop/tweets-codiv/env/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.twitter.com/1.1/statuses/lookup.json

Any ideas?

Thank you!

emilioferrara commented 1 month ago

The script will work only if you have access to the API, which these days requires a paid subscription.

Emilio

Dictated via car/iPhone Please forgive typos and brevity

On Sat, Jul 27, 2024 at 8:25 AM Giulio Ermanno Pibiri < @.***> wrote:

Hello, as per the title, I wonder if the provided script still works now that Twitter has become "X" (and its API has changed as well, maybe?).

I configured twarc correctly with my consumer_key, consumer_secret, access_token and access_token_secret, but I get the following error:

File "/Users/fulgor/Desktop/tweets-codiv/hydrate.py", line 82, in main() File "/Users/fulgor/Desktop/tweets-codiv/hydrate.py", line 43, in main hydrate(path) File "/Users/fulgor/Desktop/tweets-codiv/hydrate.py", line 75, in hydrate for tweet in twarc.hydrate(id_file.open()): File "/Users/fulgor/Desktop/tweets-codiv/env/lib/python3.12/site-packages/twarc/client.py", line 641, in hydrate resp = self.post( ^^^^^^^^^^ File "/Users/fulgor/Desktop/tweets-codiv/env/lib/python3.12/site-packages/twarc/decorators.py", line 88, in new_f resp.raise_for_status() File "/Users/fulgor/Desktop/tweets-codiv/env/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.twitter.com/1.1/statuses/lookup.json https://urldefense.us/v2/url?u=https-3A__api.twitter.com_1.1_statuses_lookup.json&d=DwQCaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=1TXSfVEwCLIo4XlNzmlYsOBiV-NQ_AyaVkPjCOlRO3aWelAYyVt767mlyZT7Gmy-&s=i13Hq4W1AVjaVG5FvKD7k5E5O4yS29sfH7EeypOwEEY&e=

Any ideas?

Thank you!

— Reply to this email directly, view it on GitHub https://urldefense.us/v2/url?u=https-3A__github.com_echen102_COVID-2D19-2DTweetIDs_issues_43&d=DwMCaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=1TXSfVEwCLIo4XlNzmlYsOBiV-NQ_AyaVkPjCOlRO3aWelAYyVt767mlyZT7Gmy-&s=tpYRX3UUhEKIqnMrztj7zDBl5I1FwinoI1kGAPv2rfc&e=, or unsubscribe https://urldefense.us/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADV7QFBL7N5KBKQKKDYXOZDZOO3VDAVCNFSM6AAAAABLR6XMNSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQZTGNJSGQYTONY&d=DwMCaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=1TXSfVEwCLIo4XlNzmlYsOBiV-NQ_AyaVkPjCOlRO3aWelAYyVt767mlyZT7Gmy-&s=TTRStLm4cBIrNO9qW5Z7yWv6OP-Z7JIF8fV9e3IQsG4&e= . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jermp commented 1 month ago

Hi @emilioferrara and thank you for your response. Unfortunately, even after paying, the error persists. I'm using the four secret keys indicated in my previous question. Do you have any clue?

-Giulio

ElenaMattei commented 1 month ago

Dear @emilioferrara, @jermp and I have also tried to follow the suggestions on Twitter Developer and used Postman, but it did not work. We forked the Twitter API v.2 environment, typed in all our keys, but still...it gives "401 unauthorized" when requesting the download of one ID. Any ideas? Your support would be highly appreciated, thanks!

Elena

ElenaMattei commented 1 month ago

@emilioferrara good news: it finally works with Postman. Bad news: it only provides the first 23 words of the tweet, like a quotation. The rest is in [...] brackets. It seems we cannot get the full raw text; we can extract just an excerpt. Does this sound new to you? thanks!

emilioferrara commented 1 month ago

Glad to hear that at least it’s somewhat working. You get the full tweet object right? With all the metadata. Not just the tweet quotation, I hope!

Emilio

Dictated via car/iPhone Please forgive typos and brevity

On Wed, Jul 31, 2024 at 9:22 AM Elena Mattei @.***> wrote:

@emilioferrara https://urldefense.us/v2/url?u=https-3A__github.com_emilioferrara&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=ghGSeFvOo7H55ftVH7-ACqCFJ4sPhnjAjT2uvvO1dwoYeSAK5LIc7wIDMi2II0Xx&s=FWaUqu2farnX0SSCRJdeM2acdcs95nPyv5VCLP6Df2M&e= good news: it finally works with Postman. Bad news: it only provides the first 23 words of the tweet, like a quotation. The rest is in [...] brackets and not visible. 100$ to get this?

— Reply to this email directly, view it on GitHub https://urldefense.us/v2/url?u=https-3A__github.com_echen102_COVID-2D19-2DTweetIDs_issues_43-23issuecomment-2D2260901026&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=ghGSeFvOo7H55ftVH7-ACqCFJ4sPhnjAjT2uvvO1dwoYeSAK5LIc7wIDMi2II0Xx&s=tFKLzbl4FoUKT6LyzJrqwvQI2NQYCFesRDMuF670b_0&e=, or unsubscribe https://urldefense.us/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADV7QFHC26MQ3ZNIZ4BMET3ZPEFKHAVCNFSM6AAAAABLR6XMNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRQHEYDCMBSGY&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=ghGSeFvOo7H55ftVH7-ACqCFJ4sPhnjAjT2uvvO1dwoYeSAK5LIc7wIDMi2II0Xx&s=sTA1-HEsPKqb9THOv7mPaBzcNsZBrltDCoQHg936PPs&e= . You are receiving this because you were mentioned.Message ID: @.***>

ElenaMattei commented 1 month ago

Hi @emilioferrara, thank you for your reply. Unfortunately we only get an excerpt, not the full text. Most of them are simply non-accessible (Postman does not report any errors), few of them are visible but just the first sentence/23 words. I tried something like 40 ids randomly from your folders, ranging from 2020-2021 (the period we are interested in) to 2023. I also reproduced the same result on my terminal by using the code created by X's official platform https://developer.twitter.com/apitools/api?endpoint=%2F2%2Ftweets&method=get), even if I add additional fields like note_tweet and text (I found the hint here: https://stackoverflow.com/questions/77476296/how-do-i-retrieve-the-full-extended-tweet-with-the-twitter-api-v2-in-postman-p). Most of the fields and extensions (metadata) are in any case inaccessible, even when the text excerpt is provided.

ElenaMattei commented 1 month ago

ps @emilioferrara we also would like to highlight that the code and programme you provide do not work, unfortunately, even with the paid subscription. The download of short excerpts only works with Postman and X Developer's guidelines. So we believe there is an issue with your source code (and IDs, as most of them do not work at all). We have no idea why X is allowing us to download only few words, and whether that depends on the date, the ID, the new policies...If you look at the info X provides (https://developer.x.com/en/docs/twitter-api/tweets/lookup/quick-start; https://developer.x.com/en/docs/twitter-api/data-dictionary/object-model/tweet; https://developer.x.com/en/docs/twitter-api/fields) no issues of this kind are reported.

emilioferrara commented 1 month ago

Elena, this script is not meant to work on the current infrastructure. It worked perfectly prior to the changes of X/Twitter to paid subscription. Hence, it’s normal that it will yield errors. We don’t have a paid subscription therefore we don’t plan to update the code for the current infrastructure. However, if you manage to solve the issues and have a working version, feel free to make a contribution and commit request and we will gladly incorporate it on this repo!

Thanks.

Emilio

Dictated via car/iPhone Please forgive typos and brevity

On Wed, Jul 31, 2024 at 10:38 AM Elena Mattei @.***> wrote:

ps @emilioferrara https://urldefense.us/v2/url?u=https-3A__github.com_emilioferrara&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=UowMVy13V8QWiLmfttOF1W4U8L87k5Exjjyy0gu7hwZ6nVuwS4ERn4_78Iv2AcfP&s=WUibIr0-xVPpJu3OLC42zGKFNx2k-0FIf6vuUKCvQfw&e= we also would like to highlight that the code and programme you provide do not unfortunately work, even with the paid subscription. The download of short excerpts only works with Postman and X Developer's guidelines. So we believe there is an issue with your source code (and IDs, as most of them do not work at all). We have no idea why X is allowing us to download only few words, and whether that depends on the date, the ID, the new policies...If you look at the info X provides ( https://developer.x.com/en/docs/twitter-api/tweets/lookup/quick-start https://urldefense.us/v2/url?u=https-3A__developer.x.com_en_docs_twitter-2Dapi_tweets_lookup_quick-2Dstart&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=UowMVy13V8QWiLmfttOF1W4U8L87k5Exjjyy0gu7hwZ6nVuwS4ERn4_78Iv2AcfP&s=RWNSuhObxnqxB6twvPAqIKpFGW7E-zXyJ-atItwTFaA&e=;

https://developer.x.com/en/docs/twitter-api/data-dictionary/object-model/tweet https://urldefense.us/v2/url?u=https-3A__developer.x.com_en_docs_twitter-2Dapi_data-2Ddictionary_object-2Dmodel_tweet&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=UowMVy13V8QWiLmfttOF1W4U8L87k5Exjjyy0gu7hwZ6nVuwS4ERn4_78Iv2AcfP&s=-DGyZMxyLaoCF-gDPNj0KF3VfsVwwJKSOpvKPKppdIA&e=; https://developer.x.com/en/docs/twitter-api/fields https://urldefense.us/v2/url?u=https-3A__developer.x.com_en_docs_twitter-2Dapi_fields&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=UowMVy13V8QWiLmfttOF1W4U8L87k5Exjjyy0gu7hwZ6nVuwS4ERn4_78Iv2AcfP&s=3gj1g248t3GqaAfD7zxrNy9i_GcF2rR_zZGYpNa1ADw&e=) no errors of this kind are reported.

— Reply to this email directly, view it on GitHub https://urldefense.us/v2/url?u=https-3A__github.com_echen102_COVID-2D19-2DTweetIDs_issues_43-23issuecomment-2D2261028543&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=UowMVy13V8QWiLmfttOF1W4U8L87k5Exjjyy0gu7hwZ6nVuwS4ERn4_78Iv2AcfP&s=YDTg_BQYFOS2Z4WHbFkzHiBMPbSLAzi54_DXl2_pD38&e=, or unsubscribe https://urldefense.us/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADV7QFGZI6BRWBTWLP6O3GLZPEOJDAVCNFSM6AAAAABLR6XMNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGAZDQNJUGM&d=DwMFaQ&c=qzHnJIRvjI6L-clJH8JwLQvf_Iq43fzikf6aoxZgMb8&r=k3nuWyZYdEgHLVKUOHxsTDBgyqmxIJFxZHuDJzEuxNE&m=UowMVy13V8QWiLmfttOF1W4U8L87k5Exjjyy0gu7hwZ6nVuwS4ERn4_78Iv2AcfP&s=GcdmMAnDDWyuiTVDRO21pm6ktOL94D1DKofAGVwfprU&e= . You are receiving this because you were mentioned.Message ID: @.***>

echen102 commented 1 month ago

To follow up on this thread, these scripts were, as Emilio mentioned, developed to run using the API prior to the changes to Twitter's/X's policies in March of 2023. Because we do not have a paid subscription, we have not updated this repository or its scripts to reflect any changes that might have been made to the API or API pipeline itself. This will be the case while the restrictions to the API remain in place.

Tweets/posts that are deleted, or their posting accounts have been deleted, removed or made private by the owner means that developers will not be able to access the data. The more time that has passed between collection and hydration increases the percentage of tweets that will no longer be accessible. During development, some of our community members rehydrated our dataset and saw a 6% deletion rate (see the other notes section). Since the first tweets are from 2020, I'm not surprised that a number of these tweets are no longer retrievable.

To be certain, I did do a random check on a couple of the IDs and did verify that these IDs are still correct and do correspond to the originally captured posts. You can do the same by manually changing a Twitter/X URL to use the ID of interest, and it will redirect you to the correct handle and post. Again, you'll find that a number of these tweets/posts have been removed in the years since they were initially posted, with this percentage growing the longer it has been since initial collection. This behavior is expected and seen in all collected Twitter/X datasets.

Hope this helps! And as Emilio mentioned, we welcome any contributions to this repository, especially to adapt the scripts to any changes that have been made to the infrastructure!

Closing this issue for now.

jermp commented 1 month ago

Thank you @echen102 and @emilioferrara for your answers.

Because we do not have a paid subscription, we have not updated this repository or its scripts to reflect any changes that might have been made to the API or API pipeline itself.

I think it would be useful for users to know this. Consider adding this statement in bold to the README in this repository. You closed this GitHub issue, but the issue still persists, so end-users should be warned.

Best, -Giulio