RecipeMD / recipemd-extract

command line utility to scrape recipe websites
17 stars 5 forks source link

Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library? #17

Open mem89de opened 1 year ago

mem89de commented 1 year ago

Hi, I'm trying to get recipemd-extract running, but it doesn't work:

$ recipemd-extract https://www.chefkoch.de/rezepte/2625281412358611/Wurst-Pasta.html
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Marc\AppData\Local\Programs\Python\Python311\Scripts\recipemd-extract.exe\__main__.py", line 7, in <module>
  File "C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipemd_extract\main.py", line 57, in main
    recipe=extract(url,args.debug)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages\recipemd_extract\main.py", line 21, in extract
    soup = BeautifulSoup(page.text, "html5lib")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages\bs4\__init__.py", line 193, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?

Do you have any suggestions? I use Python on Windows 11.

$ python --version
Python 3.11.4

$ pip show recipemd-extract html5lib recipe-scrapers recipemd requests scrape-schema-recipe
Name: recipemd-extract
Version: 1.1.1
Summary: Extracts recipes from websites and saves them in the RecipeMD format
Home-page:
Author: AberDerBart
Author-email: nonatz@web.de
License:
Location: C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: beautifulsoup4, html5lib, recipe-scrapers, recipemd, requests, scrape-schema-recipe
Required-by:
---
Name: html5lib
Version: 1.0.1
Summary: HTML parser based on the WHATWG HTML specification
Home-page: https://github.com/html5lib/html5lib-python
Author: James Graham
Author-email: james@hoppipolla.co.uk
License: MIT License
Location: C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: six, webencodings
Required-by: mf2py, pyRdfa3, recipemd-extract
---
Name: recipe-scrapers
Version: 5.3.0
Summary: Python package, scraping recipes from all over the internet
Home-page: https://github.com/hhursev/recipe-scrapers/
Author: Hristo Harsev
Author-email: r+pypi@hharsev.com
License:
Location: C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: beautifulsoup4, requests
Required-by: recipemd-extract
---
Name: recipemd
Version: 4.0.8
Summary: Markdown recipe manager, reference implementation of RecipeMD
Home-page: https://recipemd.org
Author: Tilman Stehr
Author-email: tilman@tilman.ninja
License: UNKNOWN
Location: C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: argcomplete, commonmark, dataclasses-json, pyparsing, yarl
Required-by: recipemd-extract
---
Name: requests
Version: 2.22.0
Summary: Python HTTP for Humans.
Home-page: http://python-requests.org
Author: Kenneth Reitz
Author-email: me@kennethreitz.org
License: Apache 2.0
Location: C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: certifi, chardet, idna, urllib3
Required-by: mf2py, recipe-scrapers, recipemd-extract, scrape-schema-recipe
---
Name: scrape-schema-recipe
Version: 0.0.4
Summary: Extracts cooking recipe from HTML structured data in the https://schema.org/Recipe format.
Home-page: https://github.com/micahcochran/scrape-schema-recipe
Author: Micah Cochran
Author-email:
License: Apache-2
Location: C:\Users\Marc\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: extruct, isodate, requests, setuptools, validators
Required-by: recipemd-extract
AberDerBart commented 21 hours ago

Sorry for the late response - I neglected the project for quite a while. Now we moved the project to the RecipeMD organization and I also merged a PR updating dependencies. Can you try if this fixes your issue?