Altimis / Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
MIT License
1k stars 222 forks source link

Scrape not working? #163

Open JarJarBeatyourattitude opened 1 year ago

JarJarBeatyourattitude commented 1 year ago

I wasn't getting any results from scrape, so I tried with headless=False. I noticed that search wasn't returning any results, I assume since you need an account to search. I confirmed that the links work in my browser where I'm signed in. Will the script be fixed, or am I missing something? Thanks.

fjj-088 commented 1 year ago

I also encountered the same problem.

BradKML commented 1 year ago

Is the same thing happening to other scrapers? Might want to keep an eye.

NicerWang commented 1 year ago

It's twitter's new restriction, now you need to login before searching.

  1. call utils.init_driver to get a driver
  2. call utils.log_in to login
  3. pass driver to scrape() (Need to modify scrape() in scweet.py to use passed driver instead of init a new one)
yisyed commented 1 year ago

It's twitter's new restriction, now you need to login before searching.

1. call utils.init_driver to get a `driver`

2. call utils.log_in to login

3. pass `driver` to scrape()
   (**Need to modify [scrape() in scweet.py](https://github.com/Altimis/Scweet/blob/76e7086a725980dbd5cf8d46bfc27bd4c1d6816f/Scweet/scweet.py#L71)** to use passed `driver` instead of init a new one)

Can you explain a bit more on how and what are we supposed to change.

NicerWang commented 1 year ago

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)
yisyed commented 1 year ago

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

It works! Thanks.

MykhailoYampolskyi commented 1 year ago

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

yisyed commented 1 year ago

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

It should be in your project's folder (NOTE: the file name should be '.env').

Your '.env' should be in the format given below:

SCWEET_EMAIL = "_example@email.com_"
SCWEET_PASSWORD = "_password_"
SCWEET_USERNAME = "_username_"

Below are the steps and changes I have made:

  1. I have added 'env=".env"'
    data = scrape(..., env=".env")

  2. In scrape() of 'scweet.py':

def scrape(..., env=None):    # Add this 'env=None'
    ......
    # And add this line after line (71)
    log_in(driver, env)

NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.

yisyed commented 1 year ago

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

It should be in your project's folder (NOTE: the file name should be '.env').

Your '.env' should be in the format given below:

SCWEET_EMAIL = "_example@email.com_"
SCWEET_PASSWORD = "_password_"
SCWEET_USERNAME = "_username_"

Below are the steps and changes I have made:

1. I have added 'env=".env"'
   `data = scrape(..., env=".env")`

2. In scrape() of 'scweet.py':
def scrape(..., env=None):    # Add this 'env=None'
    ......
    # And add this line after line (71)
    log_in(driver, env)

NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.

In scrape() of 'scweet.py': Edit this import in Line (9) and add 'log_in' from .utils import ..., log_in

Wish-s commented 1 year ago

在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

在 scweet.py 的 scrape() 中

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!

yisyed commented 1 year ago

在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

在 scweet.py 的 scrape() 中

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!

Try following the method I have given above. It works for me. I have kept everything the same in scrap() of scweet.py on line (71) (the proxy is 'None' by default). If it still doesn't work, let me know what's the error. Thanks.

Note: I have to restart my VScode every time I make a change in the Scweet library.

NicerWang commented 1 year ago

@Wish-s If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.

Wish-s commented 1 year ago

@Wish-s If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.

Thank you for your reply. I need a a proxy(or VPN) to connect to twitter.com, but I can't find where to obtain the parameter.

NicerWang commented 1 year ago

@Wish-s It's decided by your proxy software, in the format "PROTOCOL://IP:PORT". For clash, it use "http://127.0.0.1:7890" as default.

ihabpalamino commented 11 months ago

hello guy this is my code from selenium import webdriver from selenium.webdriver.chrome.service import Service from Scweet.scweet import scrape

Specify the parameters for scraping

username = "2MInteractive" since_date = "2023-07-01" until_date = "2023-07-11" headless = True

Set up the ChromeDriver service

service = Service("C:/Users/HP Probook/Downloads/chromedriver.exe") # Replace with the actual path to chromedriver

Set up the ChromeOptions

options = webdriver.ChromeOptions() options.headless = headless

Create the WebDriver

driver = webdriver.Chrome(service=service, options=options)

Scrape the tweets by username

data = scrape(from_account=username, since=since_date, until=until_date, headless=headless, driver=driver)

Print the scraped data

print(data)

Close the WebDriver

driver.quit() and i am having empty datalist looking for tweets between 2023-07-01 and 2023-07-06 ... path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-06%20since%3A2023-07-01%20&src=typed_query scroll 1 scroll 2 looking for tweets between 2023-07-06 and 2023-07-11 ... path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-11%20since%3A2023-07-06%20&src=typed_query scroll 1 scroll 2 Empty DataFrame Columns: [UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL] Index: []

baqachadil commented 11 months ago

check this solution, it might work if none of the others worked https://github.com/Altimis/Scweet/issues/169#issuecomment-1640205875