aahouzi / Instagram-Scraper-2021

Scrape Instagram content and stories, using a new technique based on the har file (No Token + No public API).
MIT License
111 stars 12 forks source link

[ERROR]: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/div/div/div[2]/button[1]"} (Session info: headless chrome=89.0.4389.90) #2

Closed miketyj closed 3 years ago

miketyj commented 3 years ago

Hi! Thanks for sharing.

Ran into an issue while attempting to run your script.

Here are the logs:

c:\Users\XZZZ\Desktop\Instagram-Scraper-2021-master\scraper>python -u "c:\Users\XZZZ\Desktop\Instagram-Scraper-2021-master\scraper\insta_feed_scraper.py"

[INFO]: Please enter the username or hashtag you want to scrap from: potus

[WDM] - ====== WebDriver manager ====== [WDM] - Current google-chrome version is 89.0.4389 [WDM] - Get LATEST driver version for 89.0.4389 [WDM] - Driver [C:\Users\XZZZ.wdm\drivers\chromedriver\win32\89.0.4389.23\chromedriver.exe] found in cache

DevTools listening on ws://127.0.0.1:57304/devtools/browser/139286fa-8132-4a0e-886a-f895defdddb1

[INFO]: Getting access to the user or hashtag website ..

[ERROR]: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/div/div/div[2]/button[1]"} (Session info: headless chrome=89.0.4389.90)

ghost commented 3 years ago

Hi! Thank u for point it out this new issue. U r right, just few days ago before u opened the issue, the code was working perfectly. However, Instagram is always trying to block us scrapers. So, to explain the error, here's what happened:

  1. I noticed there are some instagram pages for which u need to click on a particular button before starting to scroll down, and the page potus is one of them, so I fixed this, with a try except clause.

  2. The second problem is when u try to retrieve information from GraphQl responses, Instagram blocks u by redirecting the page to a login one, so I catch this exception and log in with a random account just to make sure we can continue scrapping.

  3. And finally, to make sure we scrap every content or feed from the instagram page, I modified the scroll down function by incrementing the sleep time by one second for each iteration.

Here, u can see the logs for scrapping potus page for example:

/Users/anasahouzi/PycharmProjects/IA/venv/bin/python /Users/anasahouzi/PycharmProjects/IA/InstaScraper/scraper/insta_feed_scraper.py

[INFO]: Please enter the username or hashtag you want to scrap from: potus

[WDM] - Current google-chrome version is 89.0.4389 [WDM] - Get LATEST driver version for 89.0.4389 [WDM] - Driver [/Users/anasahouzi/.wdm/drivers/chromedriver/mac64/89.0.4389.23/chromedriver] found in cache

[INFO]: Getting access to the user or hashtag website ..

[SUCCESS]: Got into the user or hashtag page.

[INFO]: Start scrolling to the bottom of the page to get all the content.

[SUCCESS]: Finished scrolling, it took 225.42s.

[INFO]: 17 graphql responses were extracted.

[INFO]: Number of Instagram posts: 206.

[SUCCESS]: Scrapped 12 first posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[INFO]: Failed extracting a graphQl response, now trying to access from the login page to which we were redirected.

[INFO]: Logged into the website.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Finished scrapping 216 posts, it took 71.09s.

Process finished with exit code 0

ghost commented 3 years ago

Since I didn't receive any feedback from u, I'm closing this issue.