chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
633 stars 109 forks source link

Hashtag Scraper KeyError: 'graphql' when using Selenium webdriver or Sessionid Cookie #138

Open kalebm1 opened 3 years ago

kalebm1 commented 3 years ago

Describe the bug I am trying to scrape posts from a hashtag. I am have used the both the Selenium driver and headers with a sessionid way of getting around the Instagram redirect to login page error. Before Instagram was redirecting to the login page, I was able to successfully scrape the hashtag with no problem. Once the redirection occurred, I inputted my sessionid into the headers field and got the following error: post_arr = self.json_dict["entry_data"]["TagPage"][0]["graphql"]["hashtag"]["edge_hashtag_to_media"]["edges"] KeyError: 'graphql'. I am fairly new to the library, so I decided to poke around in the code a bit and read through similar issues. After poking around, I think this error is similar to #124 in the sense that the json_dicts are not structured the same. I printed the json_dict out to a file and found that there is no graphql available nor are there many of the other things that the get_recent_posts looks for. I hope the fix for this error is as simple as the other issue.

To Reproduce Steps to reproduce the behavior:

def __init__(self):
    self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
             Chrome/79.0.3945.74 Safari/537.36 Edg/79.0.309.43",
      }
    self.hashtag = Hashtag(hashtagUrl)
    self.hashtag.scrape(headers=self.headers)
    self.hashtags = self.hashtag.get_recent_posts()

Expected behavior The expected outcome is a List[Posts] as what should typically be returned when calling the hashtag.get_recent_posts() method.

Screenshots Screenshot (313)

Desktop (please complete the following information):

havelar commented 3 years ago

I'm having exactly the same problem, and I'm also sending the SessionID in cookies if anyone say it might be the problem... Still trying to understand what could be causing this issue

yemregundogmus commented 3 years ago

I have the same issue when I search using proxy and sessionid. I think the problem is defining the sessionid, that's why missing data is coming. And the library gives error but I couldn't find how to solve it.