GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.05k stars 9.33k forks source link

Lighthouse SEO section shows "unable to download a robots.txt file" for every site #15880

Closed ErwinHofmanRV closed 3 months ago

ErwinHofmanRV commented 3 months ago

FAQ

URL

https://pagespeed.web.dev/analysis/https-www-rumvision-com/8yft0tnz5z?form_factor=mobile First seen in a LinkedIn post with a different URL and then confirmed myself with the above URL: https://www.linkedin.com/feed/update/urn:li:activity:7176884535001808897/

What happened?

The SEO audits shows "robots.txt is not validLighthouse was unable to download a robots.txt file" for every site that is being tested.

What did you expect?

Given the fact that there is a robots.txt on the domains that were tested, I would not expect any remark at all.

What have you tried?

I tried testing multiple sites to exclude the possibility that it wasn't just one particular site where this was happening. For example, I tested & confirmed with:

and some more

How were you running Lighthouse?

PageSpeed Insights

Lighthouse Version

Emulated Moto G Power with Lighthouse 11.5.0 / Using HeadlessChromium 122.0.6261.94 with lr

Chrome Version

No response

Node Version

No response

OS

Windows

Relevant log output

No response

Generosus commented 3 months ago

We second Erwin's finding. Same issue here. Our robots.txt is valid. We checked it with every tool listed here.

Recommendation: Add a notification banner to PSI's website anytime there's a known (and confirmed) backend issue impacting PSI's output (e.g, Performance, Accessibility, Best Practices, or SEO sections)

Proposed Banner Text: "This site is currently experiencing an issue impacting ___. Sit tight. We are actively working on a solution. Stay tuned for updates."

Above recommendation will keep many from going "ballistic" -- like we did :)

Looking forward to the fix.

Cheers!

DanielRuf commented 3 months ago

People should always expect that the online version of lighthouse has bugs. The version available via the browser devtools should not have this problem, since it is often an older release.

I guess there is some regression bug regarding robots.txt retrieval.

DanielRuf commented 3 months ago

Emulated Moto G Power with Lighthouse 11.5.0 / Using HeadlessChromium 122.0.6261.94 with lr

Maybe someone can do a git bisect and find out the relevant commit or check which release broke this.

Generosus commented 3 months ago

Good news. It appears the issue have been fixed.

Not sure if Lighthouse developers recently fixed a kink in the backend -- or -- the solution provided below helped.

What we did:

  1. Added the following to our robots.txt file: Disallow: /cdn-cgi/
  2. Cleared all cache layers (backend and frontend).
  3. Asked Google to recrawl our robots.txt URL.

Notes:

  1. We use Cloudflare, but never had this issue before.
  2. Above recommendation stands.

Cheers.

connorjclark commented 3 months ago

A fix went out an hour ago. Thanks for reporting the issue.