chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
630 stars 107 forks source link

ADD lxml required by scrapper #43

Closed olidroide closed 3 years ago

olidroide commented 3 years ago

Required lxml lib by beautifulSoup parser initialization in https://github.com/chris-greening/instascrape/blob/d1700be941355096dc59376c65769789fcb5fe5e/instascrape/core/_static_scraper.py#L240

Steps to reproduce

  1. Clone the repository
  2. Copy instascrape/instascrape folder to the root of another project to use it
  3. pip3 install from requirements.txt
  4. Make a simple profile scrape:
    from instascrape import Profile
    olidroide_profile = Profile('olidroide')
    olidroide_profile.scrape()
  5. Appear this error
    Traceback (most recent call last):
    File "/usr/share/pycharm/plugins/python-ce/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    File "/usr/share/pycharm/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
    File "/home/olidroide/python/tests/main.py", line 13, in <module>
    print_hi()
    File "/home/olidroide/python/tests/main.py", line 6, in print_hi
    olidroide_profile.scrape()
    File "/home/olidroide/python/tests/instascrape/core/_static_scraper.py", line 112, in scrape
    self.json_dict = self._get_json_from_source(self.source)
    File "/home/olidroide/python/tests/instascrape/core/_static_scraper.py", line 205, in _get_json_from_source
    self.soup = self._soup_from_html(self.html)
    File "/home/olidroide/python/tests/instascrape/core/_static_scraper.py", line 238, in _soup_from_html
    return BeautifulSoup(html, features="lxml")
    File "/home/olidroide/python/tests/venv/lib/python3.8/site-packages/bs4/__init__.py", line 243, in __init__
    raise FeatureNotFound(
    bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
    python-BaseException

Issue

44

chris-greening commented 3 years ago

Hey @olidroide thanks so much for bringing this to my attention 😄! lxml is actually already listed on requirements.txt, this was an issue with it not being included on setup.py as a dependency so PyPI wasn't installing it. To avoid an extra external dependency I switched the features='lxml' on BeautifoulSoup instantiations to features='html.parser' so we can have more of a batteries-included approach.

Thanks again for bringing this to my attention! 👍

olidroide commented 3 years ago

oups @chris-greening sorry to add again the requirement ;) great update to fix this issue :+1: thanks for all