chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
630 stars 107 forks source link

Instagram Web-Scraping Bugs #86

Closed MattPChoy closed 3 years ago

MattPChoy commented 3 years ago

Describe the bug

When using the Selenium webdriver, I get an error saying that Profile.url isn't a String (it's a NoneType object). When forcing it to be set by Profile.url = url_string, I get a bs4 error:

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

To Reproduce

Steps to reproduce the behavior: First Error: Header file not working. Reproduce by running the following script

from instascrape import *
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service('C:\webdriver\chromedriver.exe')
service.start()

driver = webdriver.Remote(service.service_url)

url = "some_url"
user = Profile(url)

user.scrape(webdriver=driver)
time.sleep(60)
driver.quit()

Expected behavior

A clear and concise description of what you expected to happen. Not throw a bs4 error when using the scrape syntax as described in documentation.

Desktop (please complete the following information):

tarob0ba commented 3 years ago

Try the following to resolve an external dependency:

pip3 install lxml

Explanation: That error is related to a missing package (the lxml parser), which instascrape calls when scraping a profile and extracting posts. (I think, confirm with @chris-greening)https://github.com/chris-greening/instascrape/blob/354becdf123592fa01933816cb75954e870fa988/instascrape/scrapers/profile.py#L153