chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
625 stars 109 forks source link

Profile block and list index out of range #136

Open waqarmuhammad1 opened 3 years ago

waqarmuhammad1 commented 3 years ago

I have a question been trying to get posts related to a hashtag(did it with selenium, so I have links to all the posts related to hashtag for example #backyardideas) I have been trying to filter out the posts based on US and succeeded in filtering it out using the following code:

chrome_options = Options() ua = UserAgent() userAgent = ua.random chrome_options.add_extension('IRM-Chrome.crx') chrome_options.add_argument(f'user-agent={userAgent}') chrome_options.add_argument("--window-size=1920,1080") chrome_options.add_argument("--headless") chrome_options.add_argument("--disable-gpu") chrome_options.add_argument("--disable-dev-shm-usage") chrome_options.add_argument("--no-sandbox") driver = webdriver.Chrome(chrome_options=chrome_options) driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": "const newProto = navigator.proto;" "delete newProto.webdriver;" "navigator.proto = newProto;" }) driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator, 'webdriver', { get: () => undefined }) """ })

headers = { "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57", "cookie": "sessionid={0};".format(ses_id) } for tag_name in tags: post_links = tags[tag_name] for posts in tqdm(post_links): try: post = Post(posts) post.scrape(headers=headers, webdriver=driver) time.sleep(10) if 'address_json' in post.flat_json_dict: address = json.loads(post.flat_json_dict['address_json']) cc = address['country_code'] if 'us' == str(cc).lower(): us_profiles.append((post.username, address)) except Exception as e: print(e) continue

The problem is although I have 10 seconds of delay in it for some reason, my Instagram account is getting blocked and is asking for manual verification. Any idea how could I avoid it? Second problem is its keep throwing errors

  1. list index out of range
  2. Invalid value NaN (not a number) Thanks, Waqar