Open JarJarBeatyourattitude opened 1 year ago
I also encountered the same problem.
Is the same thing happening to other scrapers? Might want to keep an eye.
It's twitter's new restriction, now you need to login before searching.
driver
driver
to scrape()
(Need to modify scrape() in scweet.py to use passed driver
instead of init a new one)It's twitter's new restriction, now you need to login before searching.
1. call utils.init_driver to get a `driver` 2. call utils.log_in to login 3. pass `driver` to scrape() (**Need to modify [scrape() in scweet.py](https://github.com/Altimis/Scweet/blob/76e7086a725980dbd5cf8d46bfc27bd4c1d6816f/Scweet/scweet.py#L71)** to use passed `driver` instead of init a new one)
Can you explain a bit more on how and what are we supposed to change.
In Your Code (Add Your Twitter Account to .env File In Advance)
from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)
In scrape() of scweet.py
def scrape(..., driver=None):
......
# Remove This Line (71)
# driver = init_driver(headless, proxy, show_images)
In Your Code (Add Your Twitter Account to .env File In Advance)
from Scweet.scweet import scrape from Scweet.utils import init_driver, log_in driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting") log_in(driver, env=".env") data = scrape(..., driver=driver)
In scrape() of scweet.py
def scrape(..., driver=None): ...... # Remove This Line (71) # driver = init_driver(headless, proxy, show_images)
It works! Thanks.
In Your Code (Add Your Twitter Account to .env File In Advance)
from Scweet.scweet import scrape from Scweet.utils import init_driver, log_in driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting") log_in(driver, env=".env") data = scrape(..., driver=driver)
In scrape() of scweet.py
def scrape(..., driver=None): ...... # Remove This Line (71) # driver = init_driver(headless, proxy, show_images)
Hi, I am new to this, could you tell where do I add .env file? Thanks
In Your Code (Add Your Twitter Account to .env File In Advance)
from Scweet.scweet import scrape from Scweet.utils import init_driver, log_in driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting") log_in(driver, env=".env") data = scrape(..., driver=driver)
In scrape() of scweet.py
def scrape(..., driver=None): ...... # Remove This Line (71) # driver = init_driver(headless, proxy, show_images)
Hi, I am new to this, could you tell where do I add .env file? Thanks
It should be in your project's folder (NOTE: the file name should be '.env').
Your '.env' should be in the format given below:
SCWEET_EMAIL = "_example@email.com_"
SCWEET_PASSWORD = "_password_"
SCWEET_USERNAME = "_username_"
Below are the steps and changes I have made:
I have added 'env=".env"'
data = scrape(..., env=".env")
In scrape() of 'scweet.py':
def scrape(..., env=None): # Add this 'env=None'
......
# And add this line after line (71)
log_in(driver, env)
NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.
In Your Code (Add Your Twitter Account to .env File In Advance)
from Scweet.scweet import scrape from Scweet.utils import init_driver, log_in driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting") log_in(driver, env=".env") data = scrape(..., driver=driver)
In scrape() of scweet.py
def scrape(..., driver=None): ...... # Remove This Line (71) # driver = init_driver(headless, proxy, show_images)
Hi, I am new to this, could you tell where do I add .env file? Thanks
It should be in your project's folder (NOTE: the file name should be '.env').
Your '.env' should be in the format given below:
SCWEET_EMAIL = "_example@email.com_" SCWEET_PASSWORD = "_password_" SCWEET_USERNAME = "_username_"
Below are the steps and changes I have made:
1. I have added 'env=".env"' `data = scrape(..., env=".env")` 2. In scrape() of 'scweet.py':
def scrape(..., env=None): # Add this 'env=None' ...... # And add this line after line (71) log_in(driver, env)
NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.
In scrape() of 'scweet.py':
Edit this import in Line (9) and add 'log_in'
from .utils import ..., log_in
在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中)
from Scweet.scweet import scrape from Scweet.utils import init_driver, log_in driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting") log_in(driver, env=".env") data = scrape(..., driver=driver)
在 scweet.py 的 scrape() 中
def scrape(..., driver=None): ...... # Remove This Line (71) # driver = init_driver(headless, proxy, show_images)
Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!
在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中)
from Scweet.scweet import scrape from Scweet.utils import init_driver, log_in driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting") log_in(driver, env=".env") data = scrape(..., driver=driver)
在 scweet.py 的 scrape() 中
def scrape(..., driver=None): ...... # Remove This Line (71) # driver = init_driver(headless, proxy, show_images)
Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!
Try following the method I have given above. It works for me. I have kept everything the same in scrap() of scweet.py on line (71) (the proxy is 'None' by default). If it still doesn't work, let me know what's the error. Thanks.
Note: I have to restart my VScode every time I make a change in the Scweet library.
@Wish-s If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.
@Wish-s If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.
Thank you for your reply. I need a a proxy(or VPN) to connect to twitter.com, but I can't find where to obtain the parameter.
@Wish-s It's decided by your proxy software, in the format "PROTOCOL://IP:PORT". For clash, it use "http://127.0.0.1:7890" as default.
hello guy this is my code from selenium import webdriver from selenium.webdriver.chrome.service import Service from Scweet.scweet import scrape
username = "2MInteractive" since_date = "2023-07-01" until_date = "2023-07-11" headless = True
service = Service("C:/Users/HP Probook/Downloads/chromedriver.exe") # Replace with the actual path to chromedriver
options = webdriver.ChromeOptions() options.headless = headless
driver = webdriver.Chrome(service=service, options=options)
data = scrape(from_account=username, since=since_date, until=until_date, headless=headless, driver=driver)
print(data)
driver.quit() and i am having empty datalist looking for tweets between 2023-07-01 and 2023-07-06 ... path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-06%20since%3A2023-07-01%20&src=typed_query scroll 1 scroll 2 looking for tweets between 2023-07-06 and 2023-07-11 ... path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-11%20since%3A2023-07-06%20&src=typed_query scroll 1 scroll 2 Empty DataFrame Columns: [UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL] Index: []
check this solution, it might work if none of the others worked https://github.com/Altimis/Scweet/issues/169#issuecomment-1640205875
I wasn't getting any results from scrape, so I tried with headless=False. I noticed that search wasn't returning any results, I assume since you need an account to search. I confirmed that the links work in my browser where I'm signed in. Will the script be fixed, or am I missing something? Thanks.