dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
16.85k stars 941 forks source link

[bug?] on one site - Exception: 'utf-8' codec can't encode character '\ud83d' - surrogates not allowed #2597

Open Pillendreher opened 2 weeks ago

Pillendreher commented 2 weeks ago

Describe the bug When checking for changes, changedetection reports the following error:

Exception: 'utf-8' codec can't encode character '\ud83d' in position 419201: surrogates not allowed. From what I gather, "\ud83d" is an emoji, yet I can't find one on the site I'm monitoring. I also check the site's code and couldn't find anything.

This is what the log says:

2024-08-29 08:09:30.784 | INFO     | changedetectionio.update_worker:run:255 - Processing watch UUID 57a8a843-ba19-4f8b-9588-8ed7df2601dc Priority 1 URL https://www.pourmoi.co.uk/products/india-lace-plunge-body/
2024-08-29 08:09:30.785 | WARNING  | changedetectionio.processors:call_browser:73 - Using playwright fetcher override for possible puppeteer request in browsersteps, because puppetteer:browser steps is incomplete.
2024-08-29 08:09:40.907 | ERROR    | changedetectionio.update_worker:run:477 - Exception reached processing watch UUID: 57a8a843-ba19-4f8b-9588-8ed7df2601dc
2024-08-29 08:09:40.907 | ERROR    | changedetectionio.update_worker:run:478 - 'utf-8' codec can't encode character '\ud83d' in position 419201: surrogates not allowed

Version v0.46.03

To Reproduce

Steps to reproduce the behavior:

  1. Click on recheck.

https://changedetection.io/share/i8HXmCg5dIga

Expected behavior The check should complete without errors.

Desktop (please complete the following information):

Additional context Interestingly enough, when going through the browser steps myself, no error is reported and a screenshot of the site is saved. Removing the browser steps does not prevent the error from appearing though.

dgtlmoon commented 2 weeks ago

Unraid is linux or?

Pillendreher commented 2 weeks ago

Yes, Slack based afaik.

Btw: Don't know if it matters, but alas: I've got changedetection running inside a Docker container.

dgtlmoon commented 2 weeks ago

Yes, Slack based afaik.

Btw: Don't know if it matters, but alas: I've got changedetection running inside a Docker container.

What is Slack? is it linux or? I dont know that platform/OS

Pillendreher commented 2 weeks ago

Slackware is a Linux distribution, on which Unraid is based.

dgtlmoon commented 2 weeks ago

Perfect, I can reproduce it locally, yeah the page has some broken UTF-8 encoding somehow, but i need to find where its failing in changedetection.io exactly