claromes / waybacktweets

Retrieves archived tweets in HTML, CSV, and JSON formats
https://claromes.github.io/waybacktweets/
GNU General Public License v3.0
80 stars 22 forks source link

Download and archive all tweets #20

Open digitalarchivo opened 4 months ago

digitalarchivo commented 4 months ago

Id like to be able to archive a twitter user's tweets, all that are on waybackmachine.

If we take a user for example, degenspartan, he has 55,000+ tweets on waybackmachine.

If I could enter his username & then extract url & image of the tweet, and do that for all 55k+ tweets, that would be ideal.

What I want to do is one, have an archive of a twitter user's tweets. And two, I'd like to present them on my GitHub pages .md site.

Please let me know what would be possible.

Thank you so much.

claromes commented 4 months ago

(1) When the MIME type is JSON or when the tweet hasn't been deleted (but is archived on Wayback). (2) In the CSV, there would be the image file name, and the image would also be downloaded. (3) This is the most complicated part, as the application on the Streamlit cloud doesn't have many resources. This option would only be available for local execution.

I would have to adjust the interface for Wayback Tweets, write the code for downloading, and the web interface for viewing the listing, and write documentation. Given the time I have, I would take 1 to 2 months to finish everything.

I think this would be a great upgrade for the tool.

What do you think?

digitalarchivo commented 4 months ago
I'm new to this -- but here are some thoughts I came up with, after reading your reply, which sounded great. 
-Retrieve all URLs of a Twitter user's tweets.
-Upload all tweet URLs to the Wayback Machine for archiving.
-For each tweet:
  -Retrieve the Wayback Machine URL.
  -Extract tweet text.
  -Download any images associated with the tweet.
  -Take a screenshot of the tweet.
  -Generate a JSON file listing the tweet details.
-Sort all tweets by date.
-For each tweet URL:
  -Extract the tweet ID.
-Utilize the GitHub project "DocNow/tweet-viewer":
  -Use the tweet IDs to fetch the tweets from Twitter's API.
  -Display all tweets in chronological order.
  -Present them on a single page with the appearance of Twitter.
  -Example: https://tweet-viewer.vercel.app/
-Take a user's tweets and upload them to the Wayback Machine.
-Retrieve all URLs for that user on the Wayback Machine, including deleted tweets.
-Create an offline archive with the information retrieved from the Wayback Machine.
-Create an online archive version for each particular user.
-Ensure the online archive version has a similar appearance to that of Twitter.

What do you think?

-https://github.com/DocNow/tweet-viewer

-https://github.com/yusuf-yldrm/Tweet-Viewer -https://tweet-viewer.vercel.app/

claromes commented 3 months ago

After the private conversation, I wrote on my blog about the upcoming updates: https://claromes.com/blog/wayback-tweets-is-moving-to-the-command-line.html

digitalarchivo commented 3 months ago

After the private conversation, I wrote on my blog about the upcoming updates: https://claromes.com/blog/wayback-tweets-is-moving-to-the-command-line.html

That sounds like a very exciting blog post, to be honest.. I'll repost it on my twitter and on reddit

digitalarchivo commented 3 months ago

https://x.com/jtig37/status/1794048221569786350

claromes commented 2 months ago

@digitalarchivo New version released with the inclusion of the command-line tool and saved tweets in various formats. New functionalities will be gradually included. Thank you for the suggestion regarding the project scope.

jtig37 commented 2 months ago

@claromes wow this is really solid..I want to congratulate you on the nice work..