Altimis / Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
MIT License
1.06k stars 224 forks source link

It doesn't scrap anything. #65

Open AleksandrovichK opened 3 years ago

AleksandrovichK commented 3 years ago

Hey guys!

I tried to reproduce just the same code as you use in examples:

users = ['nagouzil', '@yassineaitjeddi', 'TahaAlamIdrissi',
         '@Nabila_Gl', 'geceeekusuu', '@pabu232', '@av_ahmet', '@x_born_to_die_x']

users_info = get_user_information(users, headless=True)

After running the code, the variable users_info contains None.

Yet I tried this:

data = scrap(start_date="2021-05-01", max_date="2021-05-02", from_account = 'elonmusk', interval=3,
      headless=True, display_type="Top", save_images=False, filter_replies=True, proximity=True)

And the variable data contains empty dataframe.

Maybe I'm doing something fundamentally wrong?


My platform: MacOS Big Sur A version of Scweet: 1.0

Altimis commented 3 years ago

Hi, sorry for the late response. First of all, set interval with a value smaller than or equal to the period of time you want to scrap (1 in your case). Seconde, what does the code print while scraping ? Is there somethin like "Tweet... Found .." ? For the case of user_info, you need to set your credentials first.

AleksandrovichK commented 3 years ago

Okay so it appears to be I started to figure out what's wrong

First of all, I have switched off headless mode (so I can see the browser).

  1. If I specify promixity=True then Twitter doesn't show anything in the search bar. So there is something wrong with the request in the address bar.

  2. If I choose to run WITH headless mode, then I have:

    Scraping on headless mode.
    looking for tweets between 2021-04-29 and 2021-04-30 ...
    path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-04-30%20since%3A2021-04-29%20%20-filter%3Areplies&src=typed_query
    scroll  1
    scroll  2
    looking for tweets between 2021-04-30 and 2021-05-01 ...
    path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-05-01%20since%3A2021-04-30%20%20-filter%3Areplies&src=typed_query
    scroll  1
    scroll  2
    looking for tweets between 2021-05-01 and 2021-05-02 ...
    path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-05-02%20since%3A2021-05-01%20%20-filter%3Areplies&src=typed_query
    scroll  1
    scroll  2

But when I choose WITHOUT headless mode (which is obviously less preferable in production-like activities) then I have some tweets found.

looking for tweets between 2021-04-29 and 2021-04-30 ...
 path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-04-30%20since%3A2021-04-29%20%20-filter%3Areplies&src=typed_query
scroll  1
scroll  2
scroll  3
looking for tweets between 2021-04-30 and 2021-05-01 ...
 path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-05-01%20since%3A2021-04-30%20%20-filter%3Areplies&src=typed_query
scroll  1
scroll  2
looking for tweets between 2021-05-01 and 2021-05-02 ...
 path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-05-02%20since%3A2021-05-01%20%20-filter%3Areplies&src=typed_query
Tweet made at: 2021-05-01T21:36:22.000Z is found. <---- FOUND ONE
Tweet made at: 2021-05-01T21:49:50.000Z is found. <---- FOUND ONE
scroll  1
scroll  2
scroll  3
looking for tweets between 2021-05-02 and 2021-05-03 ...
 path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-05-03%20since%3A2021-05-02%20%20-filter%3Areplies&src=typed_query
Tweet made at: 2021-05-02T03:13:36.000Z is found. <---- FOUND ONE
scroll  1
scroll  2
scroll  3

So it seems like an issue. The ability to use it in headless mode - is a very important thing.

  1. But the most mysterious thing is the third issue. When I run scraping in switched-off headless mode (so I see the browser) and then I open another tab in the browser or put some window above the one with the scrapper - it sees no tweets.

I've made an experiment. I ran the scrapping with exactly the same parameters 3 times and saved shapes of the results. Here they are: (4, 11)<- the first run, I kept the browser window active all the time (2, 11)<- opened it a couple of times and then covered the window with another (0, 11)<- window with the scrapper was hidden

It seems like Twitter reacts to users' actions.

AleksandrovichK commented 3 years ago

I'm sorry, any updates? @Altimis

aseifert commented 3 years ago

I have the same issue, it works only with headless=False

purnima110895 commented 3 years ago

Hey guys ! Facing the same issue, after using headless = False, getting blank dataframe.

data = scrap(start_date="2021-05-01", max_date="2021-05-02", from_account = 'elonmusk', interval=1, headless=False, display_type="Top", save_images=False, filter_replies=True, proximity=True)

Getting output as - looking for tweets between 2021-05-01 and 2021-05-02 ... path : https://twitter.com/search?q=(from%3Aelonmusk)%20until%3A2021-05-02%20since%3A2021-05-01%20%20-filter%3Areplies&src=typed_query&lf=on scroll 1 scroll 2 scroll 3

Whats wrong here?

connorguy commented 2 years ago

Facing the same issue. Works with Headless=False otherwise when set to True I just see that it is scrolling and not picking up any of the tweets.