coleifer / micawber

a small library for extracting rich content from urls
http://micawber.readthedocs.org/
MIT License
632 stars 91 forks source link

micawber.exceptions.ProviderNotFoundException #110

Closed ponponon closed 10 months ago

ponponon commented 10 months ago
import micawber
from loguru import logger

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

logger.debug(providers.request('https://juejin.cn/post/7243725952788807717'))
Traceback (most recent call last):
  File "/Users/ponponon/Desktop/code/me/ideaboom/005.py", line 7, in <module>
    logger.debug(providers.request('https://juejin.cn/post/7243725952788807717'))
  File "/Users/ponponon/.local/share/virtualenvs/ideaboom-B0dr_aXc/lib/python3.10/site-packages/micawber/providers.py", line 111, in inner
    return fn(self, url, **params)
  File "/Users/ponponon/.local/share/virtualenvs/ideaboom-B0dr_aXc/lib/python3.10/site-packages/micawber/providers.py", line 166, in request
    raise ProviderNotFoundException('Provider not found for "%s"' % url)
micawber.exceptions.ProviderNotFoundException: Provider not found for "https://juejin.cn/post/7243725952788807717"
╰─➤  python --version                                           
Python 3.10.10

macos

─➤  pip list                                                                                              1 ↵
Package                Version
---------------------- -----------
aiosqlite              0.19.0
alembic                1.10.3
aliyun-python-sdk-core 2.14.0
aliyun-python-sdk-kms  2.16.2
amqp                   5.1.1
annotated-types        0.6.0
anyio                  3.6.2
appnope                0.1.3
apprise                1.3.0
art                    5.9
asgi-lifespan          2.1.0
astroid                3.0.1
asttokens              2.2.1
asyncpg                0.27.0
attrs                  22.2.0
autopep8               2.0.4
backcall               0.2.0
beautifulsoup4         4.9.3
cachetools             5.2.0
certifi                2022.12.7
cffi                   1.15.1
charset-normalizer     2.1.1
cli-helpers            2.3.0
click                  8.1.3
cloudpickle            2.2.1
colorama               0.4.6
configobj              5.0.6
contourpy              1.1.1
coolname               2.2.0
crcmod                 1.7
croniter               1.3.14
cryptography           36.0.2
cssselect              1.2.0
cycler                 0.12.1
dateparser             1.1.8
decorator              5.1.1
defusedxml             0.7.1
dill                   0.3.6
dnspython              2.2.1
docker                 6.0.1
etcd3                  0.12.0
eventlet               0.33.2
exceptiongroup         1.1.1
executing              1.2.0
fastapi                0.95.0
feedfinder2            0.0.4
feedparser             6.0.10
filelock               3.13.1
fonttools              4.43.1
fsspec                 2023.4.0
future                 0.18.2
Glances                3.3.0.4
google-auth            2.15.0
goose3                 3.1.17
greenlet               2.0.1
griffe                 0.27.1
grpcio                 1.57.0
h11                    0.14.0
h2                     4.1.0
happybase              1.2.0
hpack                  4.0.0
html5lib               1.0b10
httpcore               0.17.0
httpx                  0.24.0
hyperframe             6.0.1
idna                   3.4
importlib-resources    5.10.1
iniconfig              2.0.0
ipython                8.17.2
isort                  5.10.1
jedi                   0.18.2
jieba                  0.42.1
jieba3k                0.35.1
Jinja2                 3.1.2
jmespath               0.10.0
joblib                 1.3.2
jsonpatch              1.32
jsonpointer            2.3
jsonschema             4.17.3
kaleido                0.2.1
kiwisolver             1.4.5
kombu                  5.2.4
kubernetes             25.3.0
langdetect             1.0.9
lassie                 0.11.11
lazy-object-proxy      1.8.0
loguru                 0.7.2
lxml                   4.9.3
Mako                   1.2.4
Markdown               3.4.3
markdown-it-py         2.1.0
MarkupSafe             2.1.1
matplotlib             3.8.0
matplotlib-inline      0.1.6
mccabe                 0.7.0
mdurl                  0.1.2
micawber               0.5.5
mock                   4.0.3
mycli                  1.27.0
nameko                 2.14.1
netifaces              0.11.0
newspaper3k            0.2.8
nltk                   3.8.1
numpy                  1.24.1
oauthlib               3.2.2
opencv-python          4.7.0.72
orjson                 3.8.10
oss2                   2.18.3
outcome                1.2.0
packaging              22.0
pandas                 2.1.1
parso                  0.8.3
path                   16.6.0
path.py                12.5.0
pathspec               0.11.1
peewee                 3.16.2
pendulum               2.1.2
pexpect                4.8.0
pickleshare            0.7.5
pika                   1.3.1
Pillow                 9.4.0
pip                    23.2.1
platformdirs           2.6.0
plotly                 5.18.0
pluggy                 1.0.0
ply                    3.11
prefect                2.10.4
prettytable            3.9.0
prompt-toolkit         3.0.36
protobuf               3.20.0
psutil                 5.9.6
ptyprocess             0.7.0
pure-eval              0.2.2
pyaes                  1.6.1
pyahocorasick          2.0.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pycodestyle            2.10.0
pycparser              2.21
pycryptodome           3.16.0
pydantic               2.4.2
pydantic_core          2.10.1
pyecharts              2.0.4
Pygments               2.14.0
pylint                 3.0.2
pymongo                4.3.3
PyMySQL                1.0.2
pyparsing              3.1.1
pyperclip              1.8.2
pyrsistent             0.19.3
PySocks                1.7.1
pytest                 7.2.2
python-dateutil        2.8.2
python-multipart       0.0.6
python-oembed          0.2.4
python-slugify         8.0.1
pytz                   2023.3
pytz-deprecation-shim  0.1.0.post0
pytzdata               2020.1
PyYAML                 6.0
readchar               4.0.5
regex                  2023.3.23
requests               2.28.1
requests-file          1.5.1
requests-oauthlib      1.3.1
rich                   13.3.1
rsa                    4.9
ruff                   0.1.4
scipy                  1.11.3
selenium               4.12.0
setuptools             65.5.1
sgmllib3k              1.0.0
simplejson             3.19.2
six                    1.16.0
sniffio                1.3.0
sortedcontainers       2.4.0
soupsieve              2.5
SQLAlchemy             1.4.47
sqlglot                10.2.4
sqlparse               0.4.3
stack-data             0.6.2
starlette              0.26.1
tabulate               0.9.0
tenacity               8.2.3
text-unidecode         1.3
thriftpy2              0.4.16
tinysegmenter          0.3
tldextract             5.0.1
toml                   0.10.2
tomli                  2.0.1
tomlkit                0.11.6
torch                  1.13.1
torchvision            0.14.1
tqdm                   4.66.1
traitlets              5.6.0
trio                   0.22.2
trio-websocket         0.10.4
typer                  0.7.0
typing_extensions      4.8.0
tzdata                 2023.3
tzlocal                4.3
urllib3                1.26.13
uvicorn                0.21.1
vine                   5.0.0
wcwidth                0.2.5
webencodings           0.5.1
websocket-client       1.4.2
websockets             11.0.2
Werkzeug               2.2.2
wheel                  0.38.4
wrapt                  1.14.1
wsproto                1.2.0

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip
codewithriza commented 10 months ago

Issue Summary

The error micawber.exceptions.ProviderNotFoundException #110 occurs when attempting to use Micawber library for the URL https://juejin.cn/post/7243725952788807717. This error indicates that the library doesn't have a registered provider to handle content from this URL.

Possible Solutions

Here are potential steps to resolve the issue:

Error Handling

Implement error handling in your code to catch the ProviderNotFoundException specifically and provide alternative actions or messages for unsupported URLs.

Sample Python Code for Error Handling

import micawber
from loguru import logger
from micawber.exceptions import ProviderNotFoundException

try:
    providers = micawber.bootstrap_basic()
    url = 'https://juejin.cn/post/7243725952788807717'
    logger.debug(providers.request(url))
except ProviderNotFoundException as e:
    logger.error(f"Provider not found for URL: {url}. Error: {e}")
    # Handle the exception as needed, e.g., print an error message or take alternative actions.

Pip Update

Consider updating your pip version to the latest release (23.3.1) by running:

pip install --upgrade pip
coleifer commented 10 months ago

@ponponon - the list of providers out-of-the-box for the "basic" bootstrap is here and the docs are pretty clear about what happens if you request a URL that does not have a provider: https://micawber.readthedocs.io/en/latest/api.html#micawber.providers.ProviderRegistry.request

What you need to do is either:

  1. Register your own provider that takes a juejin.cn URL and returns the appropriate oEmbed metadata.
  2. Use one of the services like embedly or noembed, which may or may not provide a mechanism for extracting metadata.

Looking at noembed and embed.ly, neither of them support juejin.cn so you will need to write your own custom provider for that website.