JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.39k stars 702 forks source link

Issue with telegram - Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? #185

Closed uzaysan closed 3 years ago

uzaysan commented 3 years ago

Im trying to scrape telegram posts in a channel.

here is the command I use:

snscrape --jsonl --progress --max-results 50 telegram-channel "https://t.me/channelName" > t.json

But this gives error. How to solve this?

Here is the error log.

2021-01-18 17:54:49.215  CRITICAL  snscrape._cli  Dumped stack and locals to /tmp/snscrape_locals_x8yzajth
Traceback (most recent call last):
  File "/home/ali/.local/bin/snscrape", line 8, in <module>
    sys.exit(main())
  File "/home/ali/.local/lib/python3.8/site-packages/snscrape/_cli.py", line 270, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/home/ali/.local/lib/python3.8/site-packages/snscrape/modules/telegram.py", line 128, in get_items
    r, soup = self._initial_page()
  File "/home/ali/.local/lib/python3.8/site-packages/snscrape/modules/telegram.py", line 75, in _initial_page
    self._initialPage, self._initialPageSoup = r, bs4.BeautifulSoup(r.text, 'lxml')
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 162, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "/home/ali/.local/bin/snscrape", line 8, in <module>
    sys.exit(main())
  File "/home/ali/.local/lib/python3.8/site-packages/snscrape/_cli.py", line 270, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/home/ali/.local/lib/python3.8/site-packages/snscrape/modules/telegram.py", line 128, in get_items
    r, soup = self._initial_page()
  File "/home/ali/.local/lib/python3.8/site-packages/snscrape/modules/telegram.py", line 75, in _initial_page
    self._initialPage, self._initialPageSoup = r, bs4.BeautifulSoup(r.text, 'lxml')
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 162, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
JustAnotherArchivist commented 3 years ago

I have no idea what the apt_pkg error is; that sounds like an issue with your Python installation (e.g. this). The bs4 error indicates that you don't have lxml installed, which snscrape explicitly lists as a dependency. If it worked in the past, my guess is that you recently upgraded Python. Run another pip install for snscrape, and it should reinstall the dependencies as needed.