asmerdon / Twitter-Scraper

Twitter Scraper built using Selenium and Beautiful Soup.
5 stars 0 forks source link

the problem of collected data quantity #1

Open Miaoz1 opened 1 year ago

Miaoz1 commented 1 year ago

Thanks for your code! I actually have this problem:The keyword is “never use Expedia again”,The amount of data collected by the program and the amount of data I was able to search for were far off. I set the number of returns to be 3000, but it only collects 60 tweets into csv.

asmerdon commented 1 year ago

yea sry it doesn't work properly, currently on holiday but gonna try fix when I get back 👍 Max it returns is roughly 50

On Wed, 5 Jul 2023, 16:04 Mingjie Miao, @.***> wrote:

Thanks for your code! I actually have this problem:The keyword is “never use Expedia again”,The amount of data collected by the program and the amount of data I was able to search for were far off. I set the number of returns to be 3000, but it only collects 60 tweets into csv.

— Reply to this email directly, view it on GitHub https://github.com/asmerdon/Twitter-Scraper/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPS5RD35G4LKSYZSLYWNSLXOVQ5NANCNFSM6AAAAAAZ65VP64 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

asmerdon commented 1 year ago

Hey, I've had a go at fixing the program, there seems to be a discrepancy between the tweets retrieved when the GUI is enabled vs disabled (headless mode), with headless retrieving fewer tweets and having some duplicated. I've removed headless mode, which means there's always a window then the program is run, however it seems to be much more accurate at getting the amount of requested tweets. The quickest I've been able to scrape is at about 10 tweets/second (so a request for 500 tweets takes about 50 seconds). Depending on if you mind having duplicated tweets you can comment/uncomment the headless scraping lines at the top.

Miaoz1 commented 1 year ago

Thank you very much for your reply! I tried to use your new code and found that the number of captured is indeed higher than the previous one.But I have another problem, as shown in the picture, my number seems to be blo pic pic1 cked. Do I need to create a ip pool?

asmerdon commented 1 year ago

Hmm I'm not certain what would be causing that. I'm using a new account that I verified with email (not paid verification), that's all I can suggest trying sorry.