chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
630 stars 107 forks source link

MissingCookiesWarning and InstagramLoginRedirectError when using session id #89

Open alessandro-sassi opened 3 years ago

alessandro-sassi commented 3 years ago

Describe the bug

As the title

C:\Users\User\Anaconda3\lib\site-packages\instascrape\core\_static_scraper.py:136: MissingCookiesWarning: Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page.
  MissingCookiesWarning
Traceback (most recent call last):
  File "postscraper.py", line 5, in <module>
    from instascrape import Profile, scrape_posts
  File "C:\Users\User\Anaconda3\lib\site-packages\instascrape\__init__.py", line 9, in <module>
    google.scrape()
  File "C:\Users\User\Anaconda3\lib\site-packages\instascrape\core\_static_scraper.py", line 144, in scrape
    return_data = self._get_json_from_source(self.source, headers=headers, session=session)
  File "C:\Users\User\Anaconda3\lib\site-packages\instascrape\core\_static_scraper.py", line 265, in _get_json_from_source
    self._validate_scrape(json_dict)
  File "C:\Users\User\Anaconda3\lib\site-packages\instascrape\core\_static_scraper.py", line 301, in _validate_scrape
    raise InstagramLoginRedirectError
instascrape.exceptions.exceptions.InstagramLoginRedirectError: Instagram is redirecting you to the login page instead of the page you are trying to scrape. This could be occuring because you made too many requests too quickly or are not logged into Instagram on your machine. Try passing a valid session ID to the scrape method as a cookie to bypass the login requirement

To Reproduce

from selenium import webdriver
from instascrape import Profile, scrape_posts

webdriver = webdriver.Chrome("C:/usr/local/bin/chromedriver.exe")
SESSIONID = 'xxxxxxxxxxxxxxxxxxx'

headers = {"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57",
           "cookie": f"sessionid={SESSIONID};"}
profile = Profile("google")
profile.scrape(headers=headers)

posts = profile.get_posts(webdriver=webdriver, login_first=True)
scraped_posts, unscraped_posts = scrape_posts(posts, headers=headers, pause=10, silent=False
)

Additional context I got the session id as described in http://valvepress.com/how-to-get-instagram-session-cookie/. Just trying some code got from the posts, don't know why it's not working.

yeamusic21 commented 3 years ago

Getting a similar error

    profile = Profile(url_path)
    # header
    session_id = os.environ['INSTAGRAM_SESSIONID']
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36",
        "cookie": f"sessionid={session_id};"
        }
    print(headers)
    # call scrape
    profile.scrape(headers=headers)

Output


(venv) C:\Users\Me\Desktop\Personal\App\GitHub\App>python app\instagram_statistics.py
https://www.instagram.com/google/
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36', 'cookie': 'sessionid=######################;'}
C:\Users\Me\Desktop\Personal\App\GitHub\App\lib\site-packages\instascrape\core\_static_scraper.py:136: MissingCookiesWarning: Request header does
not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page.
  MissingCookiesWarning
Traceback (most recent call last):
  File "app\instagram_statistics.py", line 48, in <module>
    print(get_instagram_statistics('google'))
  File "app\instagram_statistics.py", line 35, in get_instagram_statistics
    post.scrape()
  File "C:\Users\Me\Desktop\Personal\App\GitHub\App\lib\site-packages\instascrape\scrapers\post.py", line 80, in scrape
    webdriver=webdriver
  File "C:\Users\Me\Desktop\Personal\App\GitHub\App\lib\site-packages\instascrape\core\_static_scraper.py", line 144, in scrape
    return_data = self._get_json_from_source(self.source, headers=headers, session=session)
  File "C:\Users\Me\Desktop\Personal\App\GitHub\App\lib\site-packages\instascrape\core\_static_scraper.py", line 265, in _get_json_from_source
    self._validate_scrape(json_dict)
  File "C:\Users\Me\Desktop\Personal\App\GitHub\App\lib\site-packages\instascrape\core\_static_scraper.py", line 301, in _validate_scrape
    raise InstagramLoginRedirectError
instascrape.exceptions.exceptions.InstagramLoginRedirectError: Instagram is redirecting you to the login page instead of the page you are trying to scrape. This could be occuring because you made too many requests too quickly or are not logged into Instagram on your machine. Try passing a valid session ID to the scrape method as a cookie to bypass the login requirement

I redacted personal information and replaced it with vague synonyms.

Also, note that I also followed http://valvepress.com/how-to-get-instagram-session-cookie/ to get a valid session id.

ardhityawiedhairawan commented 3 years ago

Hi guys. This way working on me. But, since Instagrm decided only logged users can open their site, its getting hard and hard. I found problem after a few requests, their mark as spam and we should verify account again.

It is happening to you all? Have you solution for this ?

Confirm it's You to Login
We noticed unusual activity from your account so we've logged you out. Follow the next steps within 29 days so we can try to get you back into your account before it's disabled.
yeamusic21 commented 3 years ago

I've been combing through this code, and I think I've officially gone mad. https://github.com/chris-greening/instascrape/blob/master/instascrape/core/_static_scraper.py

        if webdriver is None:
            try:
                if "sessionid" not in headers["cookie"]:
                    warnings.warn(
                        "Session ID not in cookies! It's recommended you pass a valid sessionid otherwise Instagram will likely redirect you to their login page.",
                        MissingSessionIDWarning
                    )
            except KeyError:
                warnings.warn(
                    "Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page.",
                    MissingCookiesWarning
                    )

My code includes: profile.scrape(headers=headers, webdriver=driver) Headers is:

    session_id = os.environ['INSTAGRAM_SESSIONID']
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
        "cookie": f"sessionid={session_id};"
        }

And webdriver is:

    chrome_options = webdriver.ChromeOptions()
    chrome_loc = os.environ.get("GOOGLE_CHROME_BIN")
    print(chrome_loc)
    chrome_options.add_argument("--window-size=1920,1080")
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
    exec_path = os.environ.get("CHROMEDRIVER_PATH")
    exec_path = os.environ.get("CHROMEDRIVER_PATH") + "\chromedriver.exe"
    print(exec_path)
    driver = webdriver.Chrome(executable_path=exec_path, options=chrome_options)

If I print driver in my code I get: <selenium.webdriver.chrome.webdriver.WebDriver (session="##########redacting#############")> and if I print header: {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36', 'cookie': 'sessionid=##########redacting#############;'}

Seriously, how is this code getting to MissingCookiesWarning? It makes no sense. To start, webdriver is not None, but still it makes it into the if statement. Next, 'cookie' is a key in headers, but still, it moves to except KeyError. How is this code getting to MissingCookiesWarning?