aahouzi / Instagram-Scraper-2021

Scrape Instagram content and stories, using a new technique based on the har file (No Token + No public API).
MIT License
111 stars 12 forks source link

no such element error #3

Closed yunhwankim2 closed 3 years ago

yunhwankim2 commented 3 years ago

Thank you for the project. I tried to run it, but I got the following error.

[INFO]: Please enter the username or hashtag you want to scrap from: *****

====== WebDriver manager ====== Current google-chrome version is 90.0.4430 Get LATEST driver version for 90.0.4430 There is no [mac64] chromedriver for browser 90.0.4430 in cache Get LATEST driver version for 90.0.4430 Trying to download new driver from https://chromedriver.storage.googleapis.com/90.0.4430.24/chromedriver_mac64.zip Driver has been saved in cache [/Users/**/.wdm/drivers/chromedriver/mac64/90.0.4430.24]

[INFO]: Getting access to the user or hashtag website ..

[ERROR]: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/div/button[1]"} (Session info: headless chrome=90.0.4430.212)

Do you have any idea to fix this error? Thanks in advance.

ghost commented 3 years ago

Hi! I just tried to run my code, and it's working perfectly fine, are u trying to scrap a hashtag?

yunhwankim2 commented 3 years ago

@aahouzi No. I tried to scrap my own account for testing. It returns the same error today.

yunhwankim2 commented 3 years ago

I tried to find why the error occurs. It successfully get the main url (line 261). I looked at the page source by driver.page_source, and the page source looks okay. When I run "driver.find_element_by_xpath("/html/body/div[2]/div")", no error (line 265). But when I run driver.find_element_by_xpath("/html/body/div[2]/div/div"), no such element error occurs.

ghost commented 3 years ago

Sorry for replying late to your issue, I just got time to fix the bugs u mentioned before. For example, I tried to scrap your instagram page and here are the logs:

[INFO]: Please enter the username or hashtag you want to scrap from: yunhwankim2

[WDM] - Current google-chrome version is 92.0.4515 [WDM] - Get LATEST driver version for 92.0.4515 [WDM] - Driver [/Users/anasahouzi/.wdm/drivers/chromedriver/mac64/92.0.4515.107/chromedriver] found in cache

[INFO]: Getting access to the user or hashtag website ..

[ERROR]: Instagram redirected us to a login page

[INFO]: Failed once, now trying to access from the login page to which we were redirected.

[INFO]: Please type a username and its password seperated by one space for login: testtest7530 testtest753

[SUCCESS]: Logged into the website.

[SUCCESS]: Got into the user or hashtag page.

[INFO]: Start scrolling to the bottom of the page to get all the content.

[SUCCESS]: Finished scrolling, it took 1354.19s.

[INFO]: 41 graphql responses were extracted.

[INFO]: Number of Instagram posts: 514.

[SUCCESS]: Scrapped 12 first posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 10 posts.

[SUCCESS]: Finished scrapping 502 posts, it took 143.99s.

You can also check the file feed.pkl where u can find the data I scraped from your account, u can also desactivate headless mode option to see where an error may occur ;)

yunhwankim2 commented 3 years ago

Thank you for your response. I tried to do the same work as you did above, but the result is weird to me.

In the first try, it seemed work fine but it keeps asking username and password, and I entered several times without no success. Below is the log of first try.


INFO]: Please enter the username or hashtag you want to scrap from: yunhwankim2

====== WebDriver manager ====== Current google-chrome version is 92.0.4515 Get LATEST driver version for 92.0.4515 There is no [mac64] chromedriver for browser 92.0.4515 in cache Get LATEST driver version for 92.0.4515 Trying to download new driver from https://chromedriver.storage.googleapis.com/92.0.4515.107/chromedriver_mac64.zip Driver has been saved in cache [/Users/yunhwankim/.wdm/drivers/chromedriver/mac64/92.0.4515.107]

[INFO]: Getting access to the user or hashtag website ..

[SUCCESS]: Got into the user or hashtag page.

[INFO]: Start scrolling to the bottom of the page to get all the content.

[SUCCESS]: Finished scrolling, it took 486.33s.

[INFO]: 25 graphql responses were extracted.

[INFO]: Number of Instagram posts: 514.

[SUCCESS]: Scrapped 12 first posts.

[SUCCESS]: Scrapped 12 posts.

[INFO]: Failed extracting a graphQl response, now trying to access from the login page to which we were redirected.

[INFO]: In order to carry on scraping, type a username and its password seperated by one space: ** **

[INFO]: Logged into the website.

[INFO]: Failed extracting a graphQl response, now trying to access from the login page to which we were redirected.

[INFO]: In order to carry on scraping, type a username and its password seperated by one space: ** **

[INFO]: Logged into the website.

[INFO]: Failed extracting a graphQl response, now trying to access from the login page to which we were redirected.

[INFO]: In order to carry on scraping, type a username and its password seperated by one space: ** ** [INFO]: Logged into the website.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[SUCCESS]: Scrapped 12 posts.

[INFO]: Failed extracting a graphQl response, now trying to access from the login page to which we were redirected.

[INFO]: In order to carry on scraping, type a username and its password seperated by one space: ** **

......... (Repeated to ask username and password)

In my second try, it only scraped first 12 post and ended. Below is the log.


[INFO]: Please enter the username or hashtag you want to scrap from: yunhwankim2

====== WebDriver manager ====== Current google-chrome version is 92.0.4515 Get LATEST driver version for 92.0.4515 Driver [/Users/yunhwankim/.wdm/drivers/chromedriver/mac64/92.0.4515.107/chromedriver] found in cache

[INFO]: Getting access to the user or hashtag website ..

[SUCCESS]: Got into the user or hashtag page.

[INFO]: Start scrolling to the bottom of the page to get all the content.

[SUCCESS]: Finished scrolling, it took 3.03s.

[INFO]: 0 graphql responses were extracted.

[INFO]: Number of Instagram posts: 514.

[SUCCESS]: Scrapped 12 first posts.

[SUCCESS]: Finished scrapping 12 posts, it took 0.0s.


I'll keep trying. Thank you for your response.

ghost commented 3 years ago

Yes, it happens to collect only a small number of graphqls, that's because sometimes the internet is slow and when scrolling the page down to load content it takes so much time that the stopping condition for my while loop is achieved. I did a small fix, please give it a try and show me ur logs, for me it works just as before.

yunhwankim2 commented 3 years ago

Thank you, it works now. And could you please recommend some reference (books) which was helpful for you to develop this project? I want to study about dealing with HAR file. Anyway, please delete pkl files in the collected_data folder. It might be generated in your previous test. Thank you again.

ghost commented 3 years ago

I saw a video of a guy on Youtube telling how har files can be useful to get information of GET requests, and then I got the idea to convert this into code to scrap instagram. I deleted ur feed.pkl, btw I pushed another version to reduce scrolling time :)

yunhwankim2 commented 3 years ago

Okay. Thank you so much.