chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
634 stars 110 forks source link

get_recent_posts() raises MissingCookieWarning but we can't pass a valid cookie #102

Open marco97pa opened 3 years ago

marco97pa commented 3 years ago

Describe the bug The get_recent_posts() method raises MissingCookieWarning, but we can't pass a valid cookie header to avoid that

To Reproduce

from instascrape import *

instagram_sessionid = "xxx"
headers = {"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57",
"cookie": f"sessionid={instagram_sessionid};"}
profile = Profile('https://www.instagram.com/google/')
profile.scrape(headers=headers)
print(profile.posts)
recents = profile.get_recent_posts() #We should pass a cookie here

The code is executed correctly but we get a MissingCookiesWarning: Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page. warning

If I try to pass a header cookie:

from instascrape import *

instagram_sessionid = "xxx"
headers = {"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57",
"cookie": f"sessionid={instagram_sessionid};"}
profile = Profile('https://www.instagram.com/google/')
profile.scrape(headers=headers)
print(profile.posts)
recents = profile.get_recent_posts(headers=headers) #This time I try to pass an header cookie

I get a TypeError: get_recent_posts() got an unexpected keyword argument 'headers'

Expected behavior We should be able to pass a valid cookie to avoid the warning or the warning should not be triggered altogether.

vordemann commented 3 years ago

Have the same issue!

Xerrion commented 3 years ago

I fixed it by passing cookies to Selenium before going to the profile. I do this by exporting the cookies from instagram with the chrome extension Cookie-Editor. And then just copy paste it to cookies.json

url = f"https://www.instagram.com/{handle}/"

driver.get(url)  # Needed to fake a login
# Fake login with Cookies
with open("./cookies.json", "r", newline="") as data:  # Open cookies.json
    cookies = json.load(data)
    for cookie in cookies:  # Add cookies to driver
        cookie.pop("sameSite")  # Selenium breaks with sameSite
        driver.add_cookie(cookie)  # Add our authorized cookies

ig_profile = Profile(url)  # Set IG profile
ig_profile.url = url
ig_profile.scrape(headers=headers)  # Scrape IG profile
asauce0972 commented 3 years ago

Any way around it so far without selenium?

yeamusic21 commented 3 years ago

I get the same error and posted about it at https://github.com/chris-greening/instascrape/issues/89#issuecomment-801495835

yeamusic21 commented 3 years ago

I fixed it by passing cookies to Selenium before going to the profile. I do this by exporting the cookies from instagram with the chrome extension Cookie-Editor. And then just copy paste it to cookies.json

url = f"https://www.instagram.com/{handle}/"

driver.get(url)  # Needed to fake a login
# Fake login with Cookies
with open("./cookies.json", "r", newline="") as data:  # Open cookies.json
    cookies = json.load(data)
    for cookie in cookies:  # Add cookies to driver
        cookie.pop("sameSite")  # Selenium breaks with sameSite
        driver.add_cookie(cookie)  # Add our authorized cookies

ig_profile = Profile(url)  # Set IG profile
ig_profile.url = url
ig_profile.scrape(headers=headers)  # Scrape IG profile

@Xerrion

I spent a lot of time trying this. Not sure what cookie.pop("sameSite") is doing since I don't see any sameSite keys if I call print(driver.get_cookies()), so I skipped all that and just ran driver.add_cookie({'name':'sessionid','value':os.environ['INSTAGRAM_SESSIONID']}) which just resulted in the same MissingCookiesWarning. :-(

UPDATE:

So I'm trying this again. I understand your comment now for sameSite. I'm still getting the MissingCookiesWarning though. If you're updating the driver, but not passing it to the scrape method, how is updating the driver impacting instascrape if you don't pass it to instascrape???

yeamusic21 commented 3 years ago

I've been combing through the code. Looks like you have to pass your driver to the scrape method as well. I mention it here https://github.com/chris-greening/instascrape/issues/89#issuecomment-805394041 but I'm still getting the same error even with the driver passed to scrape, which is very weird if you read the code.

yeamusic21 commented 3 years ago

Just noting that this issue has made it pretty much impossible for me to use instascrape for my use case. Due to this issue and https://github.com/chris-greening/instascrape/issues/89 at this point I've abandoned instacrape.

nullsaint commented 3 years ago

get_recent_post() always returns 24 post no matter the amount, can I bypass that? like get all the post?

yugkha3 commented 2 years ago

I tried the same thing as he did (adding the cookie manually) but still I'm getting the warning. Like what am I doing wrong? Here's the code I am using:

SESSION_ID = 'my session id'
url = f"https://www.instagram.com/discordbot98/"
webdriver.get(url)
time.sleep(10)
webdriver.add_cookie({'name': 'sessionid', 'value': SESSION_ID})