Altimis / Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
MIT License
1.05k stars 224 forks source link

Some questions #5

Open DoctorDream opened 3 years ago

DoctorDream commented 3 years ago

Thank you too much for this repository!I have spent nearly two weeks to research on how to crawl tweets with reply, but all repository like TWINT didn't work. Do you know TWINT? I'm a developer from China. After using the proxy, TWINT still keeps reporting errors.

WARNING:root:Error retrieving https://twitter.com/: ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='twitter.com', port=443): Read timed out. (read timeout=10)",),), retrying

I saw you said that Twitter has blocked all crawlers during this period of time. Is twitter unable to use it for this reason, or is it because I am in China and set up a proxy incorrectly?

Altimis commented 3 years ago

Hi @DoctorDream, Thank you for your feedback. In fact I'm not sure that twint works these days, at least it didnt work for me and thats why I worked on Scweet. The thing that I'm sure about is that all API based scrapers dont work because they changed it to version 2. Did Scweet meet your requirements ? What should I add to improve it ?

DoctorDream commented 3 years ago

@Altimis Thank you very much for your reply, your program basically met my needs, but I also encountered a little bit of problems in the process of using. I use Twitter crawler to collect conversations for academic research, but the timeline based structure of Twitter has caused me some difficulties. When I crawl the tweets, there may be two consecutive tweets replying to different tweets, which makes it impossible for me to use them to form a dialogue. Do you have a way to crawl tweets based on the main tweet, just like browsing on the web? Thank you very much for your enthusiasm!

Altimis commented 3 years ago

@DoctorDream If I understood correctly, you want to scrape replies of every tweet, is it ? like for this tweet : image You want to click on the comments and gather all the replies (1k7 replies) . If that's true, it may be a true challenger for Scweet. Because first, you may be required to sign in to be able to view replies of a giver tweet, and seconde, the process may take too long since the script needs to have access to the replies (click) and scroll to scrape all of them.

DoctorDream commented 3 years ago

@Altimis Yes, that's what I means. For a tweet, I don't have to collect all the responses. I just need to collect the highly praised ones, because those replies tend to be followed by more people. I expect to spend weeks collecting data, so the length of time it takes won't have a big impact on me. So, is it convenient for you to implement this function? Thank you very much!

Altimis commented 3 years ago

@DoctorDream I think it is possible. I'll work on that.

Altimis commented 3 years ago

@DoctorDream I have a question for you. Are you supposed to have the tweet_id of a given tweet that you want to scrape its replies ? or you want to crawl all tweets and get their replies ?

DoctorDream commented 3 years ago

@Altimis Thank you very much! Actually,I just need to crawl tweets with replies to form dialogues,so i dont need to crwal tweet with specific tweet_id.