brutalsavage / facebook-post-scraper

Facebook Post Scraper 🕵️🖱️
GNU General Public License v3.0
324 stars 116 forks source link

Can't log in because of cookies #48

Open pynomaly opened 3 years ago

pynomaly commented 3 years ago

When running the script, I get:

Traceback (most recent call last):
  File "scraper.py", line 357, in <module>
    postBigDict = extract(page=args.page, numOfPost=args.len, infinite_scroll=infinite, scrape_comment=scrape_comment)
  File "scraper.py", line 258, in extract
    _login(browser, EMAIL, PASSWORD)
  File "scraper.py", line 201, in _login
    browser.find_element_by_id('loginbutton').click()
  File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 360, in find_element_by_id
    return self.find_element(by=By.ID, value=id_)
  File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 978, in find_element
    'value': value})['value']
  File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="loginbutton"]"}
  (Session info: chrome=87.0.4280.88)

The browser shows the allow cookie window. Is there any solution?

pynomaly commented 3 years ago

After changing the log in logic with the following code:

def _login(browser, email, password):
    browser.get("http://facebook.com")
    browser.maximize_window()
    browser.find_element_by_name("email").send_keys(email)
    browser.find_element_by_name("pass").send_keys(password)
    browser.find_element_by_id("u_0_h").click()
    browser.find_element_by_name("login").click()

I get this new error:

Traceback (most recent call last):
  File "scraper.py", line 405, in <module>
    scrape_comment=scrape_comment,
  File "scraper.py", line 279, in extract
    browser.get(page)
  File "/tmp/tmp.W3CmledTvJ/env/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/tmp/tmp.W3CmledTvJ/env/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/tmp/tmp.W3CmledTvJ/env/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=87.0.4280.66)
plknkl commented 3 years ago

Managed to make it login like this

def _login(browser, email, password):
    browser.get("http://facebook.com")
    browser.maximize_window()
    browser.find_element_by_id("u_0_h").click()
    time.sleep(3)
    browser.find_element_by_name("email").send_keys(email)
    browser.find_element_by_name("pass").send_keys(password)
    browser.find_element_by_name("login").click()
    time.sleep(5)
simon-gross commented 3 years ago

The cookie-issue was solved for me by using a vpn with the US as location, since they dont have this request. Not the most beautiful solution but it worked.

SirCypkowskyy commented 3 years ago

Here is what I did

Change x_path_text_cookies and x_path_text_login data to match your language (mine is for polish).


def _login(browser, email, password):
    browser.get("http://facebook.com")
    browser.maximize_window()
    browser.find_element_by_name("email").send_keys(email)
    browser.find_element_by_name("pass").send_keys(password)
    x_path_text_cookies = '//*[@title="Akceptuj wszystkie"]'
    x_path_text_login = '//*[@name="login"]'
    browser.find_element_by_xpath(x_path_text_cookies).click()
    browser.find_element_by_xpath(x_path_text_login).click()
    time.sleep(5)
SirCypkowskyy commented 3 years ago

It should work now

ferrazzipietro commented 1 year ago

for me it worked substituting the _login with the following:

note that "consenti solo coockie essenziali" should be changed with "allow only essential cookies" for english versions.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def _login(browser, email, password):
    browser.get("http://facebook.com")
    browser.maximize_window()
    browser.implicitly_wait(5)
    WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(string(), 'Consenti solo i cookie essenziali')]"))).click()
    time.sleep(5)
    browser.find_element(By.NAME, "email").send_keys(email)
    browser.find_element(By.NAME, "pass").send_keys(password)
    browser.find_element(By.NAME, "login").click()
    time.sleep(5)
mikhail-poda commented 1 year ago

Sadly the elegant solution by @ferrazzipietro seems not to work.

DevTools listening on ws://127.0.0.1:50144/devtools/browser/248f4965-473a-42ee-a5e6-51dddec9dd2c
[24904:25920:1005/002847.731:ERROR:device_event_log_impl.cc(214)] [00:28:47.731] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[24904:25920:1005/002847.733:ERROR:device_event_log_impl.cc(214)] [00:28:47.733] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\mikha\Downloads\chromedriver_win32\scraper.py", line 258, in extract
    option.add_experimental_option("prefs", {
  File "C:\Users\mikha\Downloads\chromedriver_win32\scraper.py", line 200, in _login
    def _login(browser, email, password):
AttributeError: 'WebDriver' object has no attribute 'find_element_by_name'
>>>
ferrazzipietro commented 1 year ago

@mikhail-poda seems like you are still using find_element_by_name(), that is no longer the choice for webdriver. As far as I know, you should use find_element() and then specify by what, as I did in the snippet I posted.

mikhail-poda commented 1 year ago

Thank you @ferrazzipietro, it was my mistake - I had to close the py file in Notepad++ (saving the py file was not enough) so that the python runtime had the new py file version. After successful login and opening the group the chrome window disappears with the message

DevTools listening on ws://127.0.0.1:51236/devtools/browser/2f7e82af-6abc-4f01-8882-112db12f7ecc
[29572:8024:1005/205129.621:ERROR:device_event_log_impl.cc(214)] [20:51:29.621] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[29572:8024:1005/205129.622:ERROR:device_event_log_impl.cc(214)] [20:51:29.623] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[29572:28644:1005/205137.891:ERROR:registration_request.cc(266)] Registration response error message: PHONE_REGISTRATION_ERROR
[29572:28644:1005/205137.985:ERROR:mcs_client.cc(707)]   Error code: 500  Error message: Authentication Failed.
[29572:28644:1005/205137.985:ERROR:mcs_client.cc(709)] Failed to log in to GCM, resetting connection.
Number Of Scrolls Needed 2603