GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.27k stars 9.35k forks source link

Lone surrogate character breaks PSI proto conversion #15908

Closed dimpalambient closed 6 months ago

dimpalambient commented 6 months ago

FAQ

URL

https://tangrammontessori.site/

What happened?

Getting error while checking Page Speed.

What did you expect?

A Report

What have you tried?

https://pagespeed.web.dev/analysis/http-tangrammontessori-site/kvkrh86eqs?form_factor=desktop

How were you running Lighthouse?

PageSpeed Insights

Lighthouse Version

11.50.0

Chrome Version

No response

Node Version

No response

OS

No response

Relevant log output

Oops! Something went wrong.
generic::internal: Error unmarshalling JSON into proto: {"lighthouseVersion":"11.5.0","requestedUrl":"https://tangrammontessori.site/","mainDocumentUrl":"https://tangrammontessori.site/","finalDisplayedUrl":"https://tangrammontessori.site/","finalUrl":"https://tangrammontessori.site/","fetchTime":"2024-04-02T13:49:16.763Z","gatherMode":"navigation","runWarnings":[],"userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/122.0.6261.94 Safari/537.36","environment":{"networkUserAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36","hostUserAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/122.0.6261.94 Safari/537.36","benchmarkIndex":359,"credits":{"axe-core":"4.8.1"}},"audits":{"is-on-https":{"id":"is-on-https","title":"Uses HTTPS","description":"All sites should be protected with HTTPS, even ones that don't handle sensitive data. This includes avoiding [mixed content](https://developers.google.com/web/fundamentals/security/prevent-mixed-content/what-is-mixed-content), where some resources are loaded over HTTP despite the initial request being served over HTTPS. HTTPS prevents intruders from tampering with or passively listening in on the communications between your app and your users, and is a prerequisite for HTTP/2 and many new web platform APIs. [Learn more about HTTPS](https://developer.chrome.com/docs/lighthouse/pwa/is-on-https/)."
adamraine commented 6 months ago

I can reproduce this. For debugging, this is the report JSON that fails to render in PSI https://googlechrome.github.io/lighthouse/viewer/?gist=119fd68aab627c9481910fed9f48bf24

adamraine commented 6 months ago

Locally I get this error when trying to convert the JSON into proto:

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83e' in position 76: surrogates not allowed
adamraine commented 6 months ago

This is an interesting situation because this string happens cuts off in the middle of a surrogate pair:

Explorez le monde fascinant des dinosaures avec notre sélection exclusive ! \ud83e...

If the string was cut off on character later this problem wouldn't happen. Nevertheless, PSI should not break in this type of situation, we should just find a way to handle this.

connorjclark commented 6 months ago

This is not from our truncation. We use an ellipse for truncation, not .... The actual HTML element has an invalid utf-16 string, which we aren't handling.